QGTC_PPoPP22
QGTC_PPoPP22 copied to clipboard
Cuda error
Hello Yuke,
I tried using the public code and following the instructions, but I have encountered an error when running the example, "CUDA error at Quantize_val: No image kernel is available for excution on the device". I have attempted to build the environment both in docker and conda, and have tried using other possible CUDA versions, but the error persists. Would it be possible for you to provide some insight into what may be causing the error? Additionally, if you could provide any advice on how I can use your method or provide a runnable library, it would be greatly appreciated.
Thanks, Shuang
Hi, @publiccoderepo Thanks for reaching out, may I know the GPU you use and the CUDA/NVCC version?
Hi, @publiccoderepo Thanks for reaching out, may I know the GPU you use and the CUDA/NVCC version?
Hi @YukeWang96 , thanks for the reply. The GPU is NVIDIA GeForce RTX 2080 Ti, the cuda version is 11.3.
This seems to be the problem of SM architecture of GPU when compilation, you can try change the command in https://github.com/YukeWang96/PPoPP22_QGTC/tree/master#install-qgtc-go-to-qgtc_module-then-run
TORCH_CUDA_ARCH_LIST="7.5" python setup.py clean --all install
where TORCH_CUDA_ARCH_LIST="7.5"
is for RTX20 series with Turing architecture.
Hi @YukeWang96 , thanks for the follow up.
I tried with the command, but it reported errors as follow.
kernel.h(314): error: identifier "bmmaBitOpAND" is undefined
kernel.h(461): error: identifier "bmmaBitOpAND" is undefined
kernel.h(596): error: identifier "bmmaBitOpAND" is undefined
kernel.h(726): error: identifier "bmmaBitOpAND" is undefined
kernel.h(691): warning: variable "gdm" was declared but never referenced
kernel.h(878): error: identifier "bmmaBitOpAND" is undefined
kernel.h(999): error: identifier "bmmaBitOpAND" is undefined
6 errors detected in the compilation of "QGTC_device.cu". error: command '/usr/local/cuda-11.3/bin/nvcc' failed with exit code 1
My deviceQuery outputs 'deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 11.3, NumDevs = 4 Result = PASS', do you have any insights about it?
@publiccoderepo After checking the CUDA document, I find that bmmaBitOpAND
was not introduced until the Ampere GPU (sm>=80), Sorry about that. Here is the reference
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=bmmaBitOpAND#sub-byte-operations:~:text=bmmaBitOpAND%20%3D%202%20%20//%20compute_80%20minimum
Hi @YukeWang96 The error was resolved with your advice. Thanks a lot for you help!