lmdeploy [Feature] 我们支持gptq量化模型的推理么

trafficstars

Motivation

只见对awq的支持，未见对gptq的探讨

Related resources

No response

Additional context

No response

Jul 10 '24 14:07 eigen2017

Not yet. Moreover, both GPTQ and AWQ are W4A16, GPTQ has no advantages in accuracy and performance.

Jul 10 '24 14:07 zhyncs

but awq cannot run on gpu v100

Jul 11 '24 02:07 eigen2017

技术咨询您一下 @zhyncs 比如这个模型 https://huggingface.co/Phind/Phind-CodeLlama-34B-v2 我只有4卡v100共64G显存，想做int4量化，lmdeploy有啥方案没。

Jul 11 '24 03:07 eigen2017

but awq cannot run on gpu v100

Since I don't have a development environment for V100. Can you refer to this link https://lmdeploy.readthedocs.io/en/latest/quantization/w4a16.html and try it on V100? Please paste the error log. Thanks.

Jul 11 '24 03:07 zhyncs

@zhyncs 麻烦科普一下W4A16，W8A8分别是啥意思。

Jul 11 '24 05:07 eigen2017

W4A16 (4-bit weight, 16-bit activation). W8A8 is similar.

Jul 11 '24 05:07 zhyncs

Not yet. Moreover, both GPTQ and AWQ are W4A16, GPTQ has no advantages in accuracy and performance.

ref https://friendli.ai/blog/quantization-reduce-llm-size/

Jul 11 '24 06:07 zhyncs

#2090 adds support for both AWQ and GPTQ models on V100.

Jul 25 '24 14:07 lzhangzz

#2090 adds support for both AWQ and GPTQ models on V100.

great thanks to this pr！

Jul 26 '24 05:07 eigen2017

i saw this pr merged, https://github.com/InternLM/lmdeploy/pull/2090 so i'll try this gptq model on v100: https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GPTQ if succeeded, i'll give a report here and close this issue.

thanks to you all for this great efforts! @lzhangzz @zhyncs

Aug 30 '24 08:08 eigen2017

Please try v0.6.0a https://github.com/InternLM/lmdeploy/releases/tag/v0.6.0a0

Aug 30 '24 08:08 zhyncs

@zhyncs hi~~ 拉起来报错。

因为060a版本在pypi上没有，所以无法pip install，我从源码安装的，执行的：解压060a zip包，并进入lmd的目录，然后： mkdir -p build && cd build bash ../generate.sh make make -j$(nproc) && make install cd .. pip install -e .

然后在各种报错的引导下，我改了模型的这几处配置：微信图片_20240830223657 bf16改为fp16、量化配置group_size从-1改为了128、desc_act true改为false

这里插播一下，在上述修改后，vllm（注意我说的不是lmd）拉起来会报下面的错误： vllm

好，我们再回到lmd。改完配置后，变成了下面这个错误： err

对全是拍照表示歉意，因为v100是公司内网才能访问，谢谢！

Aug 30 '24 15:08 eigen2017

@eigen2017

目前 GPTQ 只支持 group_size=128, desc_act=False 的情况（大部分 Qwen 系列提供的 GPTQ 版本模型）。直接改 quantization config 不能改变权重本身的性质。

group_size=-1 的模型可以把 scales 和 qzeros 重复 ceil_div(input_dims, 128) 遍转成 group_size=128 的。desc_act 需要多几个重排操作，目前还没有实现。

Aug 31 '24 05:08 lzhangzz

"060a版本在pypi上没有" 是有的 pip install lmdeploy==0.6.0a0

Aug 31 '24 09:08 lvhan028

@eigen2017

目前 GPTQ 只支持 group_size=128, desc_act=False 的情况（大部分 Qwen 系列提供的 GPTQ 版本模型）。直接改 quantization config 不能改变权重本身的性质。

group_size=-1 的模型可以把 scales 和 qzeros 重复 ceil_div(input_dims, 128) 遍转成 group_size=128 的。desc_act 需要多几个重排操作，目前还没有实现。

所以，这个phind目前还不能lmd加速对吧： https://huggingface.co/TheBloke/Phind-CodeLlama-34B-v2-GPTQ

Sep 02 '24 08:09 eigen2017

lmdeploy lmdeploy copied to clipboard

[Feature] 我们支持gptq量化模型的推理么

Motivation

Related resources

Additional context

lmdeploy
lmdeploy copied to clipboard