nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

output accumulation value is very negative

Open evonneng opened this issue 2 years ago • 4 comments

After training for a long time ~81,990 steps on instant ngp, get the following stack trace:

Screenshot from 2022-07-12 10-56-19

evonneng avatar Jul 12 '22 18:07 evonneng

Hmm this is after #104 was merged right? ~~I was still getting NaNs, but never this early (maybe at 140k~200k steps).~~ (edit: I ran the train script several times in parallel; one of the runs did produce a NaN in the RGB MLP at around 90k steps)

As an FYI, -9223372036854775808 is what we get from:

>>> torch.tensor(torch.nan).long()
tensor(-9223372036854775808)

brentyi avatar Jul 12 '22 20:07 brentyi

Im going to reopen for now. The NAN can be fixed if the precision is changes, but the performance worsens.

tancik avatar Jul 18 '22 03:07 tancik

Seems like still ongoing problem. I believe this is known ongoing issue but posting stacktrace as update.

Error after 174,990 steps: Screen Shot 2022-09-29 at 1 12 01 PM

evonneng avatar Sep 29 '22 20:09 evonneng

Hi @evonneng could you try the following solution https://github.com/nerfstudio-project/nerfstudio/pull/910 and let me know if this fixes your issue?

nikmo33 avatar Nov 06 '22 12:11 nikmo33