Function '_RasterizeToPixelsBackward' returned nan values in its 0th output.

Open insomniaaac opened this issue 1 year ago • 1 comments

trainer.py:203: UserWarning: Anomaly Detection has been enabled. This mode will increase the runtime and should only be enabled for debugging.
  with autograd.detect_anomaly():
/.../python3.9/site-packages/torch/autograd/graph.py:769: UserWarning: Error detected in _RasterizeToPixelsBackward. Traceback of forward call that caused the error:
  File "trainer.py", line 633, in <module>
    cli(main, cfg, verbose=True)
  File "/.../python3.9/site-packages/gsplat/distributed.py", line 360, in cli
    return _distributed_worker(0, 1, fn=fn, args=args)
  File "/.../python3.9/site-packages/gsplat/distributed.py", line 295, in _distributed_worker
    fn(local_rank, world_rank, world_size, args)
  File "trainer.py", line 591, in main
    runner.train()
  File "trainer.py", line 219, in train
    renders, alphas, info = self.rasterize_splats(
  File "trainer.py", line 143, in rasterize_splats
    render_colors, render_alphas, info = rasterization(
  File "rendering.py", line 561, in rasterization
    render_colors_, render_alphas_ = rasterize_to_pixels(
  File "/.../python3.9/site-packages/gsplat/cuda/_wrapper.py", line 551, in rasterize_to_pixels
    render_colors, render_alphas = _RasterizeToPixels.apply(
  File "/.../python3.9/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
 (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:111.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  0%|                                                                                                                                                               | 0/30000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "trainer.py", line 633, in <module>
    cli(main, cfg, verbose=True)
  File "/.../python3.9/site-packages/gsplat/distributed.py", line 360, in cli
    return _distributed_worker(0, 1, fn=fn, args=args)
  File "/.../python3.9/site-packages/gsplat/distributed.py", line 295, in _distributed_worker
    fn(local_rank, world_rank, world_size, args)
  File "trainer.py", line 591, in main
    runner.train()
  File "trainer.py", line 279, in train
    loss.backward()
  File "/.../python3.9/site-packages/torch/_tensor.py", line 521, in backward
    torch.autograd.backward(
  File "/.../python3.9/site-packages/torch/autograd/__init__.py", line 289, in backward
    _engine_run_backward(
  File "/.../python3.9/site-packages/torch/autograd/graph.py", line 769, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Function '_RasterizeToPixelsBackward' returned nan values in its 0th output.

i found scales and opacity have some nan value, so i open torch.autograd.detect_anomaly() context. it reported that some backward errors occur. what should i do to avoid nan values in scales and opacity?

Oct 16 '24 08:10 insomniaaac

Hi, have you solved this problem?

Jun 17 '25 11:06 kevinchiu19