nerfstudio icon indicating copy to clipboard operation
nerfstudio copied to clipboard

Gaussian splatting can not set --auto-scale-poses to False?

Open yifanlu0227 opened this issue 5 months ago • 5 comments

Describe the bug RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn when training 3DGS with --auto-scale-poses False.

To Reproduce Steps to reproduce the behavior:

ns-train gaussian-splatting --experiment-name waymo-102751-res-1-iter-50000-no-scale --max-num-iterations 50000 --pipeline.datamanager.cache-images gpu colmap --data /home/ubuntu/yifanlu/SuGaR/data/waymo-102751 --downscale-factor 1 --auto-scale-poses False

Error:

No Nerfstudio checkpoint to load, so training from scratch.
Disabled comet/tensorboard/wandb event writers
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 2.6733              
VanillaPipeline.get_train_loss_dict: 2.6713              
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/nerfstudio/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
    main(
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/scripts/train.py", line 247, in main
    launch(
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/scripts/train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop
    trainer.train()
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/engine/trainer.py", line 252, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/ubuntu/yifanlu/nerfstudio/nerfstudio/engine/trainer.py", line 475, in train_iteration
    self.grad_scaler.scale(loss).backward()  # type: ignore
  File "/home/ubuntu/miniconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/home/ubuntu/miniconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Expected behavior No runtime error.

Screenshots

Additional context

yifanlu0227 avatar Jan 17 '24 15:01 yifanlu0227

Other method like nerfacto can work well with my dataset with --auto-scale-poses False.

yifanlu0227 avatar Jan 17 '24 16:01 yifanlu0227

@yifanlu0227 can you try colmap --auto-scale-poses False instead of just --auto-scale-poses False

maturk avatar Jan 17 '24 17:01 maturk

@maturk Hi! I think I have put colmap before --auto-scale-poses False. How should I modify my command? Thanks!

yifanlu0227 avatar Jan 17 '24 17:01 yifanlu0227

@maturk I got this same error RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn on a fresh install. the dozer dataset works fine on nerfacto but throws this error on splatfacto

commands to repro on my machine:

  1. install from source based on readme
  2. ns-download-data nerfstudio --capture-name=dozer
  3. ns-train splatfacto --data data/nerfstudio/dozer/ During the train run I got a warning message load_3D_points is true, but the dataset was processed with an outdated ns-process-data that didn't convert colmap points to .ply! Update the colmap dataset automatically?. which I clicked yes.

it works for library dataset which also has the warning so not sure what the root cause is.

dlazares avatar Mar 08 '24 02:03 dlazares

I think there are two issues that need to be addressed: (1) the error message RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn isn't very helpful in terms of describe why it crashes. It often happens when pose is catastrophically wrong, or pose and initial point clouds are in different coordinates.

(2) The reason that --auto-scale-poses False doesn't work may reveal a bug on transforming point clouds.

jb-ye avatar Mar 30 '24 04:03 jb-ye