GPTQ-for-LLaMa cuda extension problem

I test to install in nvidia docker, the build ninja includes incorrent sm_id like -gencode arch=compute_52,code=sm_52

# Install kernels
python setup_cuda.py install

cuda_post_cflags = -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"''  -gencode arch=compute_86,code=sm_86 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14

This should be ok.

The setup_cuda.py should be changed, the sm_89 shall be 4090, is not in the parameters.

from setuptools import setup, Extension
from torch.utils import cpp_extension

nvcc_args = [
    '-gencode', 'arch=compute_80,code=sm_80',
    '-gencode', 'arch=compute_86,code=sm_86',
    '-gencode', 'arch=compute_90,code=sm_90'
]

setup(
    name='quant_cuda',
    ext_modules=[cpp_extension.CUDAExtension(
        'quant_cuda', ['quant_cuda.cpp', 'quant_cuda_kernel.cu'], extra_compile_args={'nvcc': nvcc_args}
    )],
    cmdclass={'build_ext': cpp_extension.BuildExtension}
)

Mar 13 '23 14:03 WuNein

I'm not familiar with pytorch cuda extension. However, this code doesn't work.(RTX3090) Also, please explain the benefits of this change.

Mar 14 '23 02:03 qwopqwop200

I tested on my rtx 3090 and 4090, in nvidia docker. IT is a must setup.

Mar 14 '23 05:03 WuNein

nvcc_args = [
    '-gencode', 'arch=compute_80,code=sm_80',
    '-gencode', 'arch=compute_86,code=sm_86',
]

I have to use this code to make it work normally. Does the CUDA version have any effect?

Mar 14 '23 07:03 qwopqwop200

sm_90 is H100, 80-86 shall be 3090 - 4090. I have to specific version, or it will generate all sm, include parcel ones. This shall be notify in README, i think.

For your problem, i think it is related to driver version, older drivers do not have sm_90.

Mar 14 '23 09:03 WuNein

It seems to depend on the cuda version. sm90 is supported from cuda 12.0.

https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

Mar 14 '23 09:03 qwopqwop200

We currently use Triton instead of CUDA.

Apr 02 '23 02:04 qwopqwop200

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

cuda extension problem

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard