GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
no module named quant_cuda (fastest-inference-4bit branch)
Issue: no module named quant_cuda Branch: fastest-inference-4bit branch
After what seems to be proper install, I get the error above when I try "import quant" or "import quant_cuda".
As a corollary, is the llama_inference.py from the main triton branch still applicable to run when switched to this branch?
Install logs:
!python setup_cuda.py install (this is on colab T4)
running install /usr/local/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( /usr/local/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing quant_cuda.egg-info/PKG-INFO writing dependency_links to quant_cuda.egg-info/dependency_links.txt writing top-level names to quant_cuda.egg-info/top_level.txt /usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'quant_cuda.egg-info/SOURCES.txt' adding license file 'LICENSE.txt' writing manifest file 'quant_cuda.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext /usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.8) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) /usr/local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:398: UserWarning: There are no g++ version bounds defined for CUDA version 11.8 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}') creating build/bdist.linux-x86_64/egg copying build/lib.linux-x86_64-cpython-310/quant_cuda.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg creating stub loader for quant_cuda.cpython-310-x86_64-linux-gnu.so byte-compiling build/bdist.linux-x86_64/egg/quant_cuda.py to quant_cuda.cpython-310.pyc creating build/bdist.linux-x86_64/egg/EGG-INFO copying quant_cuda.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO copying quant_cuda.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying quant_cuda.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying quant_cuda.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt zip_safe flag not set; analyzing archive contents... pycache.quant_cuda.cpython-310: module references file creating 'dist/quant_cuda-0.0.0-py3.10-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it removing 'build/bdist.linux-x86_64/egg' (and everything under it) Processing quant_cuda-0.0.0-py3.10-linux-x86_64.egg removing '/usr/local/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg' (and everything under it) creating /usr/local/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg Extracting quant_cuda-0.0.0-py3.10-linux-x86_64.egg to /usr/local/lib/python3.10/site-packages quant-cuda 0.0.0 is already the active version in easy-install.pth
Installed /usr/local/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg Processing dependencies for quant-cuda==0.0.0 Finished processing dependencies for quant-cuda==0.0.0
@joshlevy89
I have tested the fastest-inference-4bit branch with the main/triton inference code and it works.
As for your stange issue, I would fix the cuda mismatch first in your model compile log output. Also might as well install pip install ninja Check your /usr/local/lib/python3.10/site-packages
quant-cuda directory to see if it actually pushed the correct files there. The setup may have failed.