nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

Splatfacto training crashes after 18k steps when training on VR-NeRF dataset

Open bernard0047 opened this issue 3 months ago • 0 comments

Description

When training Splatfacto on a VR-NeRF dataset, it crashes after 17910 steps with this error:

File "/root/miniconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/autograd/__init__.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

The loss being calculated here loses its gradient after 17910 steps. I verified this by printing it out on the command line but am not sure why this is happening.

To Reproduce Steps to reproduce the behavior:

  1. install a scene from the VR-NeRF dataset using ns-download-data eyefultower --capture-name apartment --resolution-name jpeg_1k
  2. run ns-train splatfacto --data eyefultower/apartment/images-jpeg-1k/transforms.json

Additional context From the viewer it seems like the model isn't learning anything either. I'm also unable to run the viewer after this update

bernard0047 avatar May 20 '24 21:05 bernard0047