Open3D-ML icon indicating copy to clipboard operation
Open3D-ML copied to clipboard

RuntimeError: CUDA error: an illegal memory access was encountered

Open negiabhinit opened this issue 1 year ago • 1 comments

Checklist

My Question

running the following script (o3dml) abhnegi@tghfg~/o3dml/Open3D-ML$ python scripts/run_pipeline.py torch -c ml3d/configs/pointpillars_kitti.yml --split test --dataset.dataset_path "/pfs/rdi/cea/rdicea_vru/01_Datasets/Kitti/" --pipeline ObjectDetection --dataset.use_cache True

gives me this error : Traceback (most recent call last): File "/pfs/rdi/cea/home/abhnegi/o3dml/Open3D-ML/scripts/run_pipeline.py", line 261, in sys.exit(main()) File "/pfs/rdi/cea/home/abhnegi/o3dml/Open3D-ML/scripts/run_pipeline.py", line 190, in main pipeline.run_test() File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/object_detection.py", line 114, in run_test self.run_valid() File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/object_detection.py", line 191, in run_valid results = model(data) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/open3d/_ml3d/torch/models/point_pillars.py", line 132, in forward x = self.extract_feats(inputs) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/open3d/_ml3d/torch/models/point_pillars.py", line 104, in extract_feats voxels, num_points, coors = self.voxelize(points) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/open3d/_ml3d/torch/models/point_pillars.py", line 117, in voxelize res_voxels, res_coors, res_num_points = self.voxel_layer(res) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/abhnegi/miniconda3/envs/o3dml/lib/python3.10/site-packages/open3d/_ml3d/torch/models/point_pillars.py", line 361, in forward [torch.zeros_like(points_feats[0:1, :]), points_feats]) RuntimeError: CUDA error: an illegal memory access was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

negiabhinit avatar Jun 22 '24 08:06 negiabhinit

same issue with:

发生异常: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
  File "/home/lz/Codes/occ_seg_interpolation/.venv/lib/python3.10/site-packages/torch/_ops.py", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
  File "/home/lz/Codes/occ_seg_interpolation/.venv/lib/python3.10/site-packages/open3d/ml/torch/python/ops.py", line 1210, in voxelize
    *_torch.ops.open3d.voxelize(points=points,
  File "/home/lz/Codes/occ_seg_interpolation/src/map/voxel_block.py", line 205, in grid_subsample
    ) = ml3d.ops.voxelize(
  File "/home/lz/Codes/occ_seg_interpolation/src/seg_occ.py", line 217, in main
    ) = grid_subsample(
  File "/home/lz/Codes/occ_seg_interpolation/.venv/lib/python3.10/site-packages/viztracer/decorator.py", line 78, in wrapper
    ret = func(*args, **kwargs)
  File "/home/lz/Codes/occ_seg_interpolation/src/batch_seg_occ.py", line 145, in single_process
    seg_occ_main(
  File "/home/lz/Codes/occ_seg_interpolation/src/batch_seg_occ.py", line 177, in main
    single_process(input_dir, prelabel_input_dir, adrn, vis_diff)
  File "/home/lz/Codes/occ_seg_interpolation/src/batch_seg_occ.py", line 189, in <module>
    main(
  File "/home/lz/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/lz/.local/share/uv/python/cpython-3.10.16-linux-x86_64-gnu/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

it constantly occurs for the second call to ml3d.ops.voxelize where import open3d.ml.torch as ml3d where as the first calling is always fine

HernandoR avatar Mar 19 '25 01:03 HernandoR