Inference example with pretrained model,CUDA kernel failed : no kernel image is available for execution on the device
When i settle down the environment required and run the sentence python tools/inference.py cfgs/PCN_models/AdaPoinTr.yaml ckpts/AdaPoinTr_PCN.pth --pc_root demo/ --save_vis_img --out_pc_root inference_result/
then it returns 2024-08-03 13:13:59,386 - MODEL - INFO - Transformer with config {'NAME': 'AdaPoinTr', 'num_query': 512, 'num_points': 16384, 'center_num': [512, 256], 'global_feature_dim': 1024, 'encoder_type': 'graph', 'decoder_type': 'fc', 'encoder_config': {'embed_dim': 384, 'depth': 6, 'num_heads': 6, 'k': 8, 'n_group': 2, 'mlp_ratio': 2.0, 'block_style_list': ['attn-graph', 'attn', 'attn', 'attn', 'attn', 'attn'], 'combine_style': 'concat'}, 'decoder_config': {'embed_dim': 384, 'depth': 8, 'num_heads': 6, 'k': 8, 'n_group': 2, 'mlp_ratio': 2.0, 'self_attn_block_style_list': ['attn-graph', 'attn', 'attn', 'attn', 'attn', 'attn', 'attn', 'attn'], 'self_attn_combine_style': 'concat', 'cross_attn_block_style_list': ['attn-graph', 'attn', 'attn', 'attn', 'attn', 'attn', 'attn', 'attn'], 'cross_attn_combine_style': 'concat'}} using group version 2 Loading weights from ckpts/AdaPoinTr_PCN.pth... ckpts @ 353 epoch( performance = {'F-Score': 0.8446799506656607, 'CDL1': 6.527985830404605, 'CDL2': 0.19307194130320907, 'EMDistance': 0.0}) CUDA kernel failed : no kernel image is available for execution on the device void furthest_point_sampling_kernel_wrapper(int, int, int, const float*, float*, int*) at L:228 in /home/mi/Pointnet2_PyTorch/pointnet2_ops_lib/pointnet2_ops/_ext-src/src/sampling_gpu.cu
i run the code on the rtx3060laptop ,gcc9, torch-1.7.1+cu110 torchaudio-0.7.2 torchvision-0.13.1+cu113
I used the same command as you and got the same error. Here’s what I did.
Best Solution (here reference) :
- go to file
/Pointnet2_PyTorch/pointnet2_ops_lib/pointnet2_ops/_ext-src/src/sampling_gpu.cu - comment out all the lines with
CUDA_CHECK_ERRORS();(there're 3 places) - run
python3 setup.py installagain in pointnet2_ops_lib folder
Second Solution (here reference) : (I've tried this but it did not work for me )
- go to file
/Pointnet2_PyTorch/pointnet2_ops_lib/setup.py - change the line
os.environ["TORCH_CUDA_ARCH_LIST"] = "3.7+PTX;5.0;6.0;6.1;6.2;7.0;7.5"toos.environ["TORCH_CUDA_ARCH_LIST"] = "5.0;6.0;6.1;6.2;7.0;7.5;8.0;8.6;8.7;8.9;9.0"or just add your specific cuda arch code (see this list), in my case I use A100 so it's 8.0 - run
python3 setup.py installagain in pointnet2_ops_lib folder
The problem come from the pointnet2_ops library, as shown in your output here:
void furthest_point_sampling_kernel_wrapper(int, int, int, const float*, float*, int*) at L:228 in /home/mi/Pointnet2_PyTorch/pointnet2_ops_lib/pointnet2_ops/_ext-src/src/sampling_gpu.cu
Since there has been no maintenance of the Pointnet2_PyTorch library since July 31, 2021, the contributor mentioned it here
Hello friend, I also encounter this issue when I'm about to create a dockerfile containing adapointr. At the end, I noticed that the problem is as @Nineyoyoyo said in the "Second Solution". The "TORCH_CUDA_ARCH_LIST" is a sign indicated the nvidia GPU compatibility when you compile the CUDA kernel image.
The Pointnet2_PyTorch is no maintenance since July 31, 2021, and in the Pointnet2_PyTorch/pointnet2_ops_lib/setup.py, the TORCH_CUDA_ARCH_LIST is set to "3.7+PTX;5.0;6.0;6.1;6.2;7.0;7.5", which is incompatible with your rtx3060laptop's 8.6. So even though you can build the image kernel, you can not run it using your 3060!
So the solution is as @Nineyoyoyo said, change the value to "5.0;6.0;6.1;6.2;7.0;7.5;8.0;8.6;8.7;8.9;9.0" or even "8.0;8.6;8.7;8.9;9.0".
My platform is 4070tis, and I'm currently running this well after this change.
Hi, I got this same error originally and tried the fixes mentioned above but it didn't work. Instead, the error is now at a new line instead of furthest point sampling:
pointnetpp/pointnet2_utils.py", line 104, in forward
return _ext.gather_points(features, idx)
RuntimeError: CUDA error: no kernel image is available for execution on the device
Pointnet builds and imports without any errors but does not execute. I'm on CUDA 12.3, Python3.9.2, torch 2.1.2+cu121