RuntimeError: CUDA error: device-side assert triggered
Thanks for your excellent work!
But when I run the command python exp_runner.py --mode train --conf ./confs/wmask_open.conf --case real_capture_fan I got the error:
/opt/conda/conda-bld/pytorch_1614378124864/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [0,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= onefailed. …… Traceback (most recent call last): File "exp_runner.py", line 934, in <module> runner.train() File "exp_runner.py", line 204, in train loss.backward() File "/home/zxy/.conda/envs/neus/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/zxy/.conda/envs/neus/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: CUDA error: device-side assert triggered
Is this a problem with the cuda device? What parameters can I adjust if I want to get it running?