DSNeRF
DSNeRF copied to clipboard
Discussion about bad results on scene horns of LLFF dataset
Hi, I tried to run DS-NeRF on scene horns of LLFF dataset while getting bad results. I am trying to figure out what happened and it will be really appreciated if someone can discuss with me.
First, I will show the results of what I got on scene horns:
I did three experiments, each using all images of scene horns (62 images, 54 for train and 8 for test). The 'original' uses the original NeRF's codes and settings. The 'no ndc only' only changes no_ndc
of original's settings to True
. The 'no ndc only, depth sup.' uses your codes and modified settings of fern_2v.txt
to scene horns (It seems that no_ndc = True
is a default setting in DS-NeRF).
As we can see, the results of 'no ndc only' is worse than 'original', while the results of 'no ndc only, depth sup.' is even worse. For checking, I would be happy if you can also share your results on scene horns. I checked the results of colmap and found nothing weird, so I am confused why using depth supervision gives me such bad results (especially in the rendering of background). As discussed in the original repo about NDC, we should set no_ndc
to False
for LLFF scenes, since
For unbounded scenes, because the Euclidean coordinates of the sampled 3d points are not bounded, we need to apply NDC trick to map the unbounded coordinates to be bounded.
So it is reasonable that 'no ndc only' is worse than 'original'. Then, I think that using depth supervion should help network to learn where the surface is, and then the fine MLP network can sample more points around the surface, whichmay alleivate the problem of unbounded scenes. However, the results of 'no ndc only, depth sup.' become even worse. Does anyone have ideas about this?
Thanks!
Hi, thanks for your interest!
We also sometimes observe suboptimal PSNR when training DS-NeRF on more than 20 images. We suspect this is because the target functions of reconstructing depths and RGB don't always coincide especially when we're asking it to reconstruct fine-grained texture. Adding depth supervision when you already have sufficient training views might not give you as good RGB reconstruction, but would produce more reasonable depths and mesh. We're still looking for a good way to make depth supervision work better in the full-view setting.
Happy to discuss more if you are still interested!
Hi, thank you for the reply!
I think it is still unclear why using depth supervision may give us suboptimal results, especially in the case of using more than 20 images. I have also experimented on scene fortress, which has a more compact, bounded scene space (less free space compared with scene horns). I use all images (as before) in this experiment. And the results are:
As expected, the results of 'no ndc only' is still worse than 'original'. On the other hand, unlike scene horns, depth supervision helps the PSNR number (compared with 'no ndc only'). So I am wondering whether depth supervision mainly works on scenes like fortress, which has a more compact, uniform, bounded space for sampling. Also, maybe selecting a better sampling strategy for scenes like horns, which has a larger depth range (unbounded) and much more free space, will prove the effectiveness of depth supervision on these scenes?
By the way, I find that no_ndc
is alwasy set to true in your recently uploaded config files. Have you tried depth supervision with no_ndc=False
, i.e., use depth supervsion in ndc space? I am not familiar with ndc space, but I think it kinds of re-parameterize the scene space to a more compact and uniform, and bounded spcae for sampling, and this maybe helpful for scenes like horns?
Hope for your reply!