关于导出vllm的问题

Open djm012 opened this issue 7 months ago • 1 comments

您好，我在做Qwen-VL-7B量化的时候，使用awq_w_only.yml做4bit量化语言层的参数，导出设置了save_vllm=True来保存真实量化模型，但是为什么导出的模型要比原始模型大？（导出的模型28G，原始模型16G）

Jun 04 '25 03:06 djm012

configs/quantization/backend/vllm/awq_w4a16.yml

quant: method: Awq weight: bit: 4 symmetric: True granularity: per_group group_size: 128 need_pack: True special: trans: True trans_version: v2 weight_clip: True quant_out: True , need_pack要制定下

Jun 06 '25 11:06 gushiqiao