LightCompress icon indicating copy to clipboard operation
LightCompress copied to clipboard

关于导出vllm的问题

Open djm012 opened this issue 7 months ago • 1 comments

您好,我在做Qwen-VL-7B量化的时候,使用awq_w_only.yml做4bit量化语言层的参数,导出设置了save_vllm=True来保存真实量化模型,但是为什么导出的模型要比原始模型大?(导出的模型28G,原始模型16G)

Image

djm012 avatar Jun 04 '25 03:06 djm012

configs/quantization/backend/vllm/awq_w4a16.yml

quant: method: Awq weight: bit: 4 symmetric: True granularity: per_group group_size: 128 need_pack: True special: trans: True trans_version: v2 weight_clip: True quant_out: True , need_pack要制定下

gushiqiao avatar Jun 06 '25 11:06 gushiqiao