Vox-Fusion Cuda error when the depth value is large.

Cuda error when the depth value is large.

Open jarvishou829 opened this issue 1 year ago • 0 comments

I record a data sequence by myself and run the code. After processing about 800 frames, the following error appears. It seems that the dim of map_states["voxel_vertex_idx"] and map_states["voxel_center_xyz"] exceeds the num_embeddings in the config file which is set to 20000. When I set the num_embeddings to 40000, after 1400+ frames the error appears again. How can I solve this correctly? I find that when the depth value is large, the dim of map_states["voxel_vertex_idx"] and map_states["voxel_center_xyz"] turns to get large. When I scale the depth value to 0.5 of the origin value the error no longer appears, but the rendered result is not good.

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Process Process-2:
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/nerf_ws/ori/voxfusion/src/mapping.py", line 128, in spin
    self.do_mapping(share_data, tracked_frame, writer=writer)
  File "/home/user/nerf_ws/ori/voxfusion/src/mapping.py", line 182, in do_mapping
    bundle_adjust_frames(
  File "/home/user/nerf_ws/ori/voxfusion/src/utils/renderer.py", line 496, in bundle_adjust_frames
    final_outputs = render_rays(
  File "/home/user/nerf_ws/ori/voxfusion/src/utils/renderer.py", line 288, in render_rays
    chunk_inputs = get_features(chunk_samples, map_states, voxel_size)
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/nerf_ws/ori/voxfusion/src/utils/renderer.py", line 96, in get_features
    point_feats = F.embedding(F.embedding(
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
Exception raised from operator() at ../c10/cuda/CUDACachingAllocator.cpp:1808 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f29bcada20e in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2759b (0x7f29bcb5559b in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x27621 (0x7f29bcb55621 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x608180 (0x7f29afd28180 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x4669f8 (0x7f29afb869f8 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: c10::TensorImpl::release_resources() + 0x175 (0x7f29bcac17a5 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x3628c5 (0x7f29afa828c5 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x67ca08 (0x7f29afd9ca08 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: THPVariable_subclass_dealloc(_object*) + 0x2d5 (0x7f29afd9cdd5 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x114b78 (0x55f151578b78 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #10: <unknown function> + 0x13b248 (0x55f15159f248 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #11: <unknown function> + 0x121e38 (0x55f151585e38 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #12: <unknown function> + 0x1330d8 (0x55f1515970d8 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #13: <unknown function> + 0x1330c1 (0x55f1515970c1 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #14: <unknown function> + 0x1330c1 (0x55f1515970c1 in /home/user/miniconda3/envs/voxfusion/bin/python)

Oct 31 '23 04:10 jarvishou829

Vox-Fusion Vox-Fusion copied to clipboard

Cuda error when the depth value is large.

Vox-Fusion
Vox-Fusion copied to clipboard