LightCompress
LightCompress copied to clipboard
LLMC 升级 transformers 版本 lmms_eval 版本支持 Qwen2.5-VL-7B 模型, 同时配置 vision 和 language , 按 GPTQ 量化,好像支持不了同时量化呢
yaml 配置:
base:
seed: &seed 0
model:
# type: Qwen2VL
type: Qwen2_5VL
path: /mnt/data/junjun.zhao/models/Qwen2.5-VL-7B-Instruct
# path: /mnt/data/junjun.zhao/models/Qwen2.5-VL-3B-Instruct
# tokenizer_mode: fast
torch_dtype: torch.float32
calib:
# name: wikitext2
# download: True
# n_samples: 128
# path: calib data path
# bs: 1
# seq_len: 2048
# preproc: wikitext2_gptq
# seed: *seed
name: custom_mm
n_samples: 128
# n_samples: 128
download: False
path: /mnt/data/junjun.zhao/datasets/VQA_mini
apply_chat_template: True
add_answer: True # Defalut is False. If set it to Ture, calib data will add answers.
bs: 1
seq_len: 512
preproc: vlm_general
padding: True
seed: *seed
eval:
# eval_pos: [pretrain, fake_quant]
eval_pos: [fake_quant]
type: vqa
name: [mme]
download: False
path: /mnt/data/junjun.zhao/datasets/datasets/lmms-lab/MME
# name: wikitext2
# download: True
# seq_len: 2048
bs: 1
inference_per_block: False
quant:
method: GPTQ
# quant_objects: [vision, language] # default is [language]
# quant_objects: [language] # default is [language]
vision:
method: GPTQ
weight:
bit: 8
symmetric: True
granularity: per_channel
group_size: -1
calib_algo: mse
# calib_algo: mse
mse_b_num: 2
act:
bit: 8
symmetric: True
granularity: per_token
calib_algo: minmax
special:
actorder: True
static_groups: False
percdamp: 0.01
blocksize: 128
true_sequential: True
language:
method: GPTQ
weight:
bit: 8
symmetric: True
granularity: per_channel
group_size: -1
calib_algo: mse
# calib_algo: mse
# mse_b_num: 2
act:
bit: 8
symmetric: True
granularity: per_token
calib_algo: minmax
special:
actorder: True
static_groups: False
percdamp: 0.02
blocksize: 128
true_sequential: True
quant_out: True
save:
save_fake: True
save_path: /mnt/data/junjun.zhao/saved_model/qwen2_5_vl_7b_gptq_w8a8_vlm_language/
# save_path: /mnt/data/junjun.zhao/saved_model/qwen2_5_vl_3b_gptq_w8a8_vlm/
版本说明:
- Name: transformers Version: 4.57.0
- Name: lmms_eval Version: 0.5.0
修改了部分代码适配 transformers 和 lmms_eval 新版本
问题 1:
- quant_objects: [vision, language] : 这么配置的时候,只会走 language , debug 相关代码不会走 vision
- vision 和 language 分开配置: 在量化校准 cali 阶段可以走到 vision 和 language, 但是 deploy_fake_quant_model 的时候,只会 执行 vision , 没有 for 循环,执行不到 language
- deploy_fake_quant_model vision 时 把 Linear 替换为 EffcientFakeQuantLinear 时, merger.mlp. 也会报错
- 当前代码似乎不能 同时构建 vision 和 language 的带有 EffcientFakeQuantLinear 的 model
您好,感谢您对我们工作的认可与指正,我们仔细核对了代码,确实存在这方面的bug,现已更新,请查看
问题回复: 1、quant_objects: [vision, language] 废案,已更新移除 2、deploy_fake_quant_model 之前vision部分未显式展示,已更新 3、merger部分视为projector, llmc中当时是不量化的。这次已更新解决其未量化带来的bug报错。如果需要projector量化,可先做不量化的,然后写一个naive量化的脚本,把projector转量化 4、可以,需vision 和 language 分开配置,已更新参考config
感谢,我后面更新代码试一下,谢谢