threestudio icon indicating copy to clipboard operation
threestudio copied to clipboard

Don't modify image's rgb color

Open claforte opened this issue 1 year ago • 3 comments

  • the code (originally from Stable-DreamFusion) that loaded images changed the RGB components of the RGBA image:
    • (1 - rgba[..., 3:]) made colors artificially bright (when alpha is between 0 and 1), resulting in halos around the silhouette
    • pre-multiplying by alpha degrades the original color

IMHO neither of these transformation are necessary, nor desirable. The colors should be preserved as-is.

  • also added "bollywood_actress" images, that I generated myself using SD2.2.x.

    • her silhouette has areas with lots of semi-transparent hair
    • the outline of her hair is abnormally bright but that's SD2.2 simulating hair back-scatter... which isn't simulated in this NeRF model, so can't be accurately reconstructed in 3D
  • making this changes makes very little difference on Anya and on "Bollywood actress", but I maintain a strong opinion that this is more correct

to illustrate the difference, here are results early during training (I guess around step 400):

image

  • pay attention to the left-most column of each experiment. Those are the NeRF renders for each batch item. (batch size=4)

  • the right side (default) shows the old code. Notice how the side of the characters' hair are artificially bright.

  • the middle side removes the additive term (1 - rgba[..., 3:]).

  • the left side shows the result of the new code. It preserves the original colors more accurately.

  • run_zero123_examples.sh has the commands I ran, although I manually updated the code in between each experiment.

claforte avatar Jun 16 '23 01:06 claforte

@bennyguo I only tested with the zero123 system. Let me know if you have concerns about other systems.

claforte avatar Jun 16 '23 01:06 claforte

BTW the results are in https://stability.wandb.io/threestudio/claforte-dont_composite ... and the loss itself didn't change noticeably (since only a tiny fraction of pixels are affected and eventually, all variants of the code converge roughly to the same result)

claforte avatar Jun 16 '23 01:06 claforte

This looks isolated from other text-based methods as they don't use the image-conditioned dataset. I'm curious about why the processed RGBA image has large areas with alpha between 0 and 1? I thought it would have alpha~=1 in the foreground and alpha~=0 in the background?

bennyguo avatar Jun 19 '23 16:06 bennyguo

Off the top of my head, hair and other semi-transparent surfaces can arise... e.g. stained glass, a car's window, the thing wings on the baby phoenix, etc.

claforte avatar Jun 23 '23 00:06 claforte

In my opinion, the alpha compositing formulation is correct but the matting results are inaccurate. Should we just consider the area with alpha larger than some threshold as the hard mask? Using hard masks makes more sense as more advanced segmentation models we'll probably use (like SAM) also produce hard masks.

bennyguo avatar Jun 23 '23 04:06 bennyguo