Pointnet_Pointnet2_pytorch icon indicating copy to clipboard operation
Pointnet_Pointnet2_pytorch copied to clipboard

RuntimeError: CUDA error: device-side assert triggered , when adding one extra dimension

Open leopardodavid opened this issue 2 years ago • 3 comments

Hi, First, nice job with this repository. I was using it without a problem for xyz values. I added one extra dimension now xyzw since that is the format that comes from my dataset and I would like to use with that extra dimension. When I am using the GPU on my laptop it goes fine and can fully train it. However, when I use a dedicated GPU I get this error:

GPUs:  1
GPU processor set:  NVIDIA A100-SXM4-40GB
2022-05-12 11:17:09.877913: I tensorflow/stream_executor/platform/default/dso_loader.cc:54] Successfully opened dynamic library libcudart.so.11.0
PARAMETER ...
Namespace(batch_size=4, decay_rate=0.0001, epoch=3, gpu='0', learning_rate=0.0004, log_dir='GPU_trn_v2_4dim_test', lr_decay=0.0, model='pnet2_radar_semseg_msg', npoint=3097, optimizer='Adam', step_size=10, train_dataset_path=PosixPath('/RadarScenes/train'), train_snippet_path=PosixPath('static/train.txt'), valid_dataset_path=PosixPath('/RadarScenes/validation'), valid_snippet_path=PosixPath('static/validation.txt'))
Shape of training data:  torch.Size([4, 3097, 5]) . Shape of labels:  torch.Size([4, 3097])

Computing label weights -----------------
Progress: |██████████████████████████████████████████████████| 100.0% Complete
The number of training data is: 40
The number of test data is: 5
No existing model, starting training from scratch...
**** Epoch 1 (1/3) ****
Learning rate:0.000400
BN momentum updated to: 0.100000
  0%|          | 0/10 [00:00<?, ?it/s]
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [64,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [65,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [66,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [67,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [68,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [69,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [70,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [71,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [72,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [73,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [74,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [75,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [76,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [77,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [78,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [79,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [80,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [81,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [82,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [83,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [84,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [85,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [86,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [87,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [88,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [89,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [90,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [91,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [92,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [93,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [94,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:93: operator(): block: [76,0,0], thread: [95,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
  0%|          | 0/10 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/workspace/PointNet/Pnet_pytorch/train_radar_semseg_msg.py", line 485, in <module>
    main(args)        
  File "/workspace/PointNet/Pnet_pytorch/train_radar_semseg_msg.py", line 345, in main
    seg_pred, trans_feat = classifier(points)  #pass the points thorugh the model network
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/PointNet/Pnet_pytorch/models/pnet2_radar_semseg_msg.py", line 34, in forward
    l2_xyz, l2_points = self.sa2(l1_xyz, l1_points) 
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/PointNet/Pnet_pytorch/models/pointnet2_utils.py", line 247, in forward
    grouped_xyz = index_points(xyz, group_idx) 
  File "/workspace/PointNet/Pnet_pytorch/models/pointnet2_utils.py", line 61, in index_points
    new_points = points[batch_indices, idx, :]
RuntimeError: CUDA error: device-side assert triggered

Do I need to change something in: pointnet2_utils.py? or in the architecture definition? Many thanks in advance!

leopardodavid avatar May 12 '22 10:05 leopardodavid

Hi, I meet the same problem, did you solve it?

Yellowshuohahaha avatar Jan 15 '23 03:01 Yellowshuohahaha

Hello, I get the same error too. Please, did you solve it ? Thanks

brbr1 avatar Feb 16 '23 09:02 brbr1

Try replacing query-ball_point in pointnet2_utils.py with the following (using the pytorch3d library):

from pytorch3d.ops import ball_query

def query_ball_point(radius, nsample, xyz, new_xyz):
    dists, idx, nn = ball_query( p1=new_xyz, p2=xyz, K=nsample,radius = radius)
    return idx

jasonkena avatar Jun 12 '23 10:06 jasonkena