instant-ngp
instant-ngp copied to clipboard
Unstable training results
Thanks for your astonishing work. In my own datasets, I used exactly the same parameters to train many times, but the PSNR results were different each time. For example, the three results of PSNR are 23.81, 25.17, 21.43. Is any randomness introduced into the code? Should I avoid this situation?
Hi there, training should be approximately deterministic since all the random numbers are seeded.
I am saying "approximately", because floating point addition isn't commutative and the order of gradient accumulation depends on thread scheduling, so there can be slight numerical differences. These appear to be somewhat more significant in the NeRF setting than in others.
I'll double-check the codebase for other potential sources of non-determinism -- it has been a while since I verified this last time.
In the interim: it would be helpful to see how much of a visual difference the non-determinism makes in your case and to learn about your training parameters. Do you see these large differences after training for a few seconds, or after a few minutes?
I did train a nerf model. I can see these large differences after training for a few seconds. And even if I increase the training time, such as 20 minutes, the results are still different. The following are the three training results under the same parameters:
The visual difference is also relatively large, and the three results are as follows:
Hi, @Tom94. Do you have any solution for the above unstable training results? I checked the code but didn't find a way to avoid it.
I have the same problem with the unstable result, I ran three times in a same images datasets, the first time I can see a clear outline model, but the second and the third is a totally mess result.
Are you testing on your own dataset? Does its results good? I don't get good result on my own dataset, I want to know why?
I've been training NeRFs with this code recently, and have made a few changes to the COLMAP extraction method, which appears to help significantly with training at the expense of both (relatively minor) added compute time and (more major) RAM usage.
slash-under/instant-ngp@efdd42a851039b689b84b84191e0358dbd35f07d
It's about 1-1.25 GB of RAM per thread at peak usage during initial extraction for about 250 3840x2160 frames, but greatly improves NeRF model quality with large datasets in my experience.
Sample:
https://user-images.githubusercontent.com/63025235/155902826-58a972c6-4891-4ef5-88c5-1026180abf41.mp4
That's very good to know, thank you very much for pointing it out! (Also love the video!)
I see that there was a commit pushed to reduce memory usage -- hopefully the data for the next model might fit now with it. Is there any way one could utilize more than one GPU if the dataset doesn't fit into VRAM?
Edit: looking a bit further, maybe I can use the PyTorch bindings of the underlying library to accomplish something similar.
@Tom94 Could we please have the spam removed? I sent a report to GitHub, so now we're waiting on a response from them...
@slash-under great video! Could you please give some hints on parameters set for aerial photos? I followed the tutorial on custom datasets but still struggling to get anything clear enough
I'll double-check the codebase for other potential sources of non-determinism -- it has been a while since I verified this last time.
Hi @Tom94, thanks for your work! Are there any updates on this? I'm training some NeRFs on custom data and measuring the cosine similarity between the weights of the different NeRF instances trained on the same images/depths. This is always ~0.75. Do you think that this can be somehow increased?