nerf-pytorch
nerf-pytorch copied to clipboard
RuntimeError: CUDA error: device-side assert triggered
Dear Author,
Thank you for the cool implementation. I installed successfully and tried to run "python train_nerf.py --config config/lego.yml" But I am getting RuntimeError: CUDA error: device-side assert triggered.
Traceback (most recent call last):
File "train_nerf.py", line 404, in
Any suggestions to solve this?
Thank you!
It's hard to tell without knowing the exact config, but does this issue seem to help you? https://github.com/krrish94/nerf-pytorch/issues/9
I tried reducing the chunk size and the num of layers. But the error still persists. Besides, I do not have issues with gpu memory. If you are talking about the model config file, I just used the default config file given in the github.
I think the number of labels is wrong,causes error when calculating loss value
but there is no number of labels involved as per my understanding
It's hard to tell without knowing the exact config, but does this issue seem to help you? #9 @krrish94 Here is more information about the error: Looks like the error is in file (/nerf-pytorch/nerf/nerf_helpers.py, line 301; cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)) ============================================================== /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [47,0,0] Ass ertion
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [48,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [50,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [53,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [54,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [56,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [59,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [60,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [62,0,0] Ass ertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. 0%| | 0/200000 [00:08<?, ?it/s] Traceback (most recent call last): File "train_nerf.py", line 406, inmain() File "train_nerf.py", line 242, in main encode_direction_fn=encode_direction_fn, File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in run_on e_iter_of_nerf for batch in batches File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 180, in for batch in batches File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/train_utils.py", line 101, in predic t_and_render_radiance det=(getattr(options.nerf, mode).perturb == 0.0), File "/home/aswamy/github_repos/NeRF/nerf-pytorch-krish/nerf-pytorch/nerf/nerf_helpers.py", line 301, in sampl e_pdf_2 cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g) RuntimeError: CUDA error: device-side assert triggered
@krrish94 Following the preview comment: I tried to print the tensors and found that values of tensor inds_g is too large and too small(which casuses out of bounds error) inds_g.min() = tensor(-4993021444723710459) inds_g.max() = tensor(4575432887736600530) inds_g = tensor([[[ 4255818524050935954, 62], [ 4256250760978027750, 62], [ 4237722774569238629, 62], ...,]]
usage of this tensor: file "run_nerf_helpers.py" cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)
More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted() inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)
More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted() inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)
It may be related to the issue you created for searchsorted() @krrish94
More update: Actually these large values of indices are coming from func torchsearchsorted.searchsorted() inds = torchsearchsorted.searchsorted(cdf, u, side="right") --> after this line of code inds values are very extreme(out of bounds)
It may be related to the issue you created for searchsorted() @krrish94
I replaced torchsearchsorted.searchsorted() with official torch.searchsorted(), now the error is gone and in runs successfully, though I am not sure influence on performance due to this change. I think it may be worth mentioning somewhere because I spent some time to get this :)
Thank you!
Thanks so much for digging into this! From a skim this appears to be due to a weird config that's potentially leading to indexing errors. I'd trust the newer torch searchsorted function as opposed to the external package.
can you update your code?because I change my code but it's not work
I think you can try to upgrade your python's libraries, such as numpy and so on,I do that and succed run it