ChatGLM-6B 请问：ChatGLM-6B-INT8跟load(ChatGLM-6B).quantize(8)有什么区别吗？

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

请问ChatGLM-6B-INT8跟load(ChatGLM-6B).quantize(8)有什么区别吗？

Expected Behavior

No response

Steps To Reproduce

无

Environment

无

Anything else?

No response

Apr 16 '23 16:04 ninghongbo123

What model were you trying to use?

Apr 16 '23 18:04 kuso-ge

same proble to me，vicuna-13b-GPTQ-4bit-128g，windows10, 8700,2080ti

Apr 17 '23 13:04 7801943

Which version of GPTQ are you using? oobabooga's or latest?

Apr 17 '23 17:04 jllllll

oobabooga's

Apr 17 '23 18:04 Zach9113

@Zach9113

python -m pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/main/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl --force-reinstall

Also, make sure that you have the cuda branch of the GPTQ repo. git clone https://github.com/oobabooga/GPTQ-for-LLaMa -b cuda You should have it as oobabooga removed the other branch, but check to be sure.

There was an update to the GPTQ code in the webui recently. Make sure your webui is updated. The new code may require you to switch to latest GPTQ. I haven't updated yet myself, so I don't know.

Apr 17 '23 18:04 jllllll

where do i run that command in the repositories folder?

Apr 17 '23 19:04 Zach9113

I ran that command and it didn't change anything ile "C:\AI\oobabooga-windows\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize) NameError: name 'quant_cuda' is not defined

Apr 17 '23 19:04 Zach9113

@Zach9113 It needs to be entered after opening the cmd.bat script. That will allow you to modify the virtual environment that the webui is installed with. I don't know if that command will fix the issue. As I said before, the GPTQ code in the webui was changed recently and I haven't had time to test anything.

Edit: I just re-installed and everything is working for me. Looking at the code in quant.py, I don't see why you would get that error. If quant_cuda is missing, then you would get a different error. If quant_cuda is loaded, then you shouldn't get that error at all. My knowledge of Python simply isn't good enough to know what the issue is.

Apr 17 '23 21:04 jllllll

@jllllll when i reinstalled everything there was an error with Cuda and it said it was set to 0.0.0 . I'm at work rn when I get off I'm going to start from scratch. C:\AI\oobabooga_windows\installer_files\env\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info creating quant_cuda.egg-info writing quant_cuda.egg-info\PKG-INFO writing dependency_links to quant_cuda.egg-info\dependency_links.txt writing top-level names to quant_cuda.egg-info\top_level.txt writing manifest file 'quant_cuda.egg-info\SOURCES.txt' reading manifest file 'quant_cuda.egg-info\SOURCES.txt' writing manifest file 'quant_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext error: [WinError 2] The system cannot find the file specified When I get the web UI running it does tell me CUDA extension not installed>

Apr 18 '23 05:04 Zach9113

same with me now, after doing everything

Apr 19 '23 16:04 LuciEdits

same for me. During install I get this:

RuntimeError: Error compiling objects for extension
CUDA kernel compilation failed.
Attempting installation with wheel.
Collecting quant-cuda==0.0.0
  Using cached https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/main/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl (398 kB)

I confirmed that the repo branch being cloned is in fact: git clone https://github.com/oobabooga/GPTQ-for-LLaMa -b cuda

but when trying to start up I too see the CUDA extension not installed

bin Z:\oobabooga\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
Loading anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g...
CUDA extension not installed.
Found the following quantized model: models\anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g\gpt-x-alpaca-13b-native-4bit-128g-cuda.pt
Loading model ...
Done.

When trying to interact - similar errors:

  File "Z:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa\quant.py", line 426, in forward
    quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
NameError: name 'quant_cuda' is not defined
Output generated in 0.31 seconds (0.00 tokens/s, 0 tokens, context 35, seed 1592413025)

Been trying to resolve this for weeks now across several versions textGenUI and one-click installer.

RTX 3090 & system with 64GB ram

Apr 24 '23 23:04 pawprint

@Zach9113
python -m pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/main/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl --force-reinstall
Also, make sure that you have the cuda branch of the GPTQ repo. git clone https://github.com/oobabooga/GPTQ-for-LLaMa -b cuda You should have it as oobabooga removed the other branch, but check to be sure.

There was an update to the GPTQ code in the webui recently. Make sure your webui is updated. The new code may require you to switch to latest GPTQ. I haven't updated yet myself, so I don't know.

oh hell yea this just fixed all my problems I have been having with oobabooga I had 1 model working kind of before this was slow as and didn't even know it was an ai got really weird it told me it was impossible that it lived in a folder on a PC lol not sure how I did that one.

Apr 29 '23 19:04 nexusdragoon

@Zach9113
python -m pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/main/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl --force-reinstall
Also, make sure that you have the cuda branch of the GPTQ repo. git clone https://github.com/oobabooga/GPTQ-for-LLaMa -b cuda You should have it as oobabooga removed the other branch, but check to be sure.

There was an update to the GPTQ code in the webui recently. Make sure your webui is updated. The new code may require you to switch to latest GPTQ. I haven't updated yet myself, so I don't know.

This fixed it for me too. My generation speed also is lot faster now with the quantized models. Just running that command in the cmd_windows.bat was enough for me.

May 17 '23 09:05 ghost

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Aug 31 '23 23:08 github-actions[bot]

ChatGLM-6B ChatGLM-6B copied to clipboard

请问：ChatGLM-6B-INT8跟load(ChatGLM-6B).quantize(8)有什么区别吗？

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

ChatGLM-6B
ChatGLM-6B copied to clipboard