GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
cuda extension problem
I test to install in nvidia docker, the build ninja includes incorrent sm_id like -gencode arch=compute_52,code=sm_52
# Install kernels
python setup_cuda.py install
cuda_post_cflags = -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
This should be ok.
The setup_cuda.py should be changed, the sm_89 shall be 4090, is not in the parameters.
from setuptools import setup, Extension
from torch.utils import cpp_extension
nvcc_args = [
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode', 'arch=compute_86,code=sm_86',
'-gencode', 'arch=compute_90,code=sm_90'
]
setup(
name='quant_cuda',
ext_modules=[cpp_extension.CUDAExtension(
'quant_cuda', ['quant_cuda.cpp', 'quant_cuda_kernel.cu'], extra_compile_args={'nvcc': nvcc_args}
)],
cmdclass={'build_ext': cpp_extension.BuildExtension}
)
I'm not familiar with pytorch cuda extension. However, this code doesn't work.(RTX3090) Also, please explain the benefits of this change.
I tested on my rtx 3090 and 4090, in nvidia docker. IT is a must setup.
nvcc_args = [
'-gencode', 'arch=compute_80,code=sm_80',
'-gencode', 'arch=compute_86,code=sm_86',
]
I have to use this code to make it work normally. Does the CUDA version have any effect?
sm_90 is H100, 80-86 shall be 3090 - 4090. I have to specific version, or it will generate all sm, include parcel ones. This shall be notify in README, i think.
For your problem, i think it is related to driver version, older drivers do not have sm_90.
It seems to depend on the cuda version. sm90 is supported from cuda 12.0.
https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
We currently use Triton instead of CUDA.