During the learning process, the following error occurs and learning is interrupted.

[TRAIN] Iter: 40300 Loss: 0.011321269907057285  PSNR: 23.059185028076172
 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                 11%|█████████████████████▎                                                                                                                                                                       | 20356/180001 [1:25:30<11:07:17,  3.99it/s][W python_anomaly_mode.cpp:104] Warning: Error detected in PowBackward0. Traceback of forward call that caused the error:
  File "run_nerf.py", line 858, in <module>
    train()
  File "run_nerf.py", line 751, in train
    img_loss0 = img2mse(extras['rgb0'], target_s)
  File "/app/nerf/run_nerf_helpers.py", line 12, in <lambda>
    img2mse = lambda x, y : torch.mean((x - y) ** 2)
 (function _print_stack)
 11%|█████████████████████▎                                                                                                                                                                       | 20356/180001 [1:25:30<11:10:36,  3.97it/s]
Traceback (most recent call last):
  File "run_nerf.py", line 858, in <module>
    train()
  File "run_nerf.py", line 755, in train
    loss.backward()
  File "/opt/conda/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.8/site-packages/torch/autograd/__init__.py", line 145, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.

Here's my configuration.

expname = mydata_test
basedir = ./logs
datadir = ./data/nerf_llff_data/mydata
dataset_type = llff

factor = 8
llffhold = 8

N_rand = 1024
N_samples = 64
N_importance = 64

use_viewdirs = True
raw_noise_std = 1e0

May 19 '21 02:05 m1kit

Maybe this can be fixed by adding eps here?

May 19 '21 02:05 m1kit

Have the same issue here, Any solutions?

Sep 22 '21 09:09 xiaohulihutu

Maybe this can be fixed by adding eps here? Sorry sir, what is eps?

Sep 22 '21 09:09 xiaohulihutu

eps means epsilon ε. it means very small value like 0.0000001

Sep 23 '21 18:09 m1kit

Do you have code to reproduce the error?

Sep 23 '21 19:09 yenchenlin

I saw m1kit was testing his own set and my error popped up when I was training my own set. I used Colmap to get the camera position info and it can run like 20k iterations, but it will stop randomly at a point saying RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.

I tried fern sample set and no error at all (sometimes it shows GPU om, but no error when I reduced the settings). I did not change much in code except adding a ParallelData to use all four GPUs at the same time.

I just wondering what did m1kit do and waiting for his response.

Sep 24 '21 06:09 xiaohulihutu

Unfortunately, for personal reasons, I cannot provide the dataset that caused this error. To be honest, it was 4 months ago, so it's hard to remember how to reproduce it in detail. I apologize for not being able to help you.

Sep 24 '21 11:09 m1kit

Hello, I encountered the same problem when using SCNeRF, which borrows heavily from this repository, to train on custom data.

Data

The data can be accessed through this google drive link: https://drive.google.com/drive/folders/1SUzKMn6oD4inzN-m7RmHVl7gGEnq-Iv4?usp=sharing

Logs

[TRAIN] Iter: 209100 Loss: 0.006338230334222317  PSNR: 25.197158813476562
[TRAIN] Iter: 209200 Loss: 0.007395393215119839  PSNR: 24.48368263244629
[TRAIN] Iter: 209300 Loss: 0.007888318039476871  PSNR: 24.342876434326172
[TRAIN] Iter: 209400 Loss: 0.00826267059892416  PSNR: 24.05372428894043
[TRAIN] Iter: 209500 Loss: 0.0067442795261740685  PSNR: 24.944828033447266
Starts Validation Rendering
VAL PSNR 144: 22.382625579833984
Validation PRD : 0.4792793095111847
  File "run_nerf.py", line 1052, in <module>
    train()
  File "run_nerf.py", line 506, in train
    train_loss_0 = img2mse(extras['rgb0'], target_s)
  File "/home/julius_m/code/SCNeRF/NeRF/run_nerf_helpers.py", line 10, in <lambda>
    img2mse = lambda x, y : torch.mean((x - y) ** 2)
 (function _print_stack)
 26%|██████████████████████████████████████████▋                                                                                                                        | 209573/800000 [8:00:36<22:34:01,  7.27it/s]
Traceback (most recent call last):
  File "run_nerf.py", line 1052, in <module>
    train()
  File "run_nerf.py", line 606, in train
    train_loss.backward()
  File "/home/julius_m/miniconda3/envs/icn/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/julius_m/miniconda3/envs/icn/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.
! [Numerical Error] rgb_map contains nan or inf.
! [Numerical Error] disp_map contains nan or inf.
! [Numerical Error] acc_map contains nan or inf.
! [Numerical Error] raw contains nan or inf.
! [Numerical Error] rgb0 contains nan or inf.
! [Numerical Error] disp0 contains nan or inf.
! [Numerical Error] acc0 contains nan or inf.
! [Numerical Error] z_std contains nan or inf.

Launch script

cd NeRF

python run_nerf.py \
    --config configs/llff_data/lamp.txt \
    --expname lamp \
    --chunk 8192 \
    --N_rand 1024 \
    --camera_model pinhole_rot_noise_10k_rayo_rayd \
    --ray_loss_type proj_ray_dist \
    --multiplicative_noise True \
    --i_ray_dist_loss 10 \
    --grid_size 10 \
    --ray_dist_loss_weight 0.0001 \
    --N_iters 800001 \
    --use_custom_optim True \
    --ray_o_noise_scale 1e-3 \
    --ray_d_noise_scale 1e-3 \
    --non_linear_weight_decay 0.1 \
    --add_ie 200000 \
    --add_od 400000 \
    --add_prd 600000

Config

Note: make sure to change the datadir to where you downloaded the above data.

configs/llff_data/lamp.txt

expname = lamp
basedir = ./logs
datadir = <path_to_lamp_dir>/lamp
dataset_type = llff

factor = 8
llffhold = 8

N_rand = 1024
N_samples = 64
N_importance = 64

use_viewdirs = True
raw_noise_std = 1e0

Nov 06 '21 14:11 AugustasMacijauskas

I can confirm this problem is happening to me on https://github.com/apchenstu/mvsnerf, trying out with either the lego synthetic dataset, or the orchid llff dataset.

I'll try to see how to make this reproducible.

Nov 08 '21 20:11 cduguet

Hello, I encountered the same problem when using SCNeRF, which borrows heavily from this repository, to train on custom data.

Data

The data can be accessed through this google drive link: https://drive.google.com/drive/folders/1SUzKMn6oD4inzN-m7RmHVl7gGEnq-Iv4?usp=sharing

Hi @AugustasMacijauskas did you have any success training with your custom dataset?

Jan 19 '22 02:01 davodogster

@davodogster No, I lost my patience and moved on to other things. I was also having a hard time figuring out how to debug this efficiently, since training for a few hours before it crashes and then changing one line of code and seeing if that helps is not going to work.

Jan 20 '22 11:01 AugustasMacijauskas

If it is an error in the 0th output, that means your weights are still not fully updated so some values in some batch's predictions , during your first epoch are nans. So it's not your inputs, but your model predictions that are nans. Could be an overflow or underflow error. This will make any loss function give you a tensor(nan).What you can do is put a check for when loss is nan and let the weights adjust themselves

criterion = SomeLossFunc()
eps = 1e-6
loss = criterion(preds,targets)
if loss.isnan(): loss=eps
else: loss = loss.item()
loss = loss+ L1_loss + ...

Nov 01 '22 07:11 snknitin

nerf-pytorch nerf-pytorch copied to clipboard

RuntimeError: Function 'PowBackward0' returned nan values in its 0th output.

Data

Logs

Launch script

Config

Data

nerf-pytorch
nerf-pytorch copied to clipboard