LightCompress Qwen2.5VL 3B calibration using awq takes too long

For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?

Hardware platform: H20 * 1 input token numbers (text+image) is ≈ 200

awq yaml config:

base:
    seed: &seed 42
model:
    type: Qwen2_5VL
    path: xxx
    tokenizer_mode: slow
    torch_dtype: auto
calib:
    name: custom_mm
    download: False
    path: xxx
    apply_chat_template: True
    n_samples: 960
    bs: 16
    seq_len: 512
    padding: True
    seed: *seed

quant:
    method: Awq
    weight:
        bit: 4
        symmetric: False
        granularity: per_group
        group_size: 64
        # Available options: ['gemm_pack']
        pack_version: gemm_pack
    special:
        trans: True
        trans_version: v2
        weight_clip: True
        do_gqa_trans: True
    quant_out: False
save:
    save_mlcllm: True
    save_fake: True
    save_path: xxx

Aug 11 '25 02:08 kritohyh

I found that increasing the batch size has a linear improvement in performance.

Aug 12 '25 13:08 kritohyh

type: Qwen2_5VL how to work?

Aug 28 '25 06:08 shengqihailuo1

Qwen2_5VL ?

AttributeError: 'Qwen2_5_VLModel' object has no attribute 'layers'

self.model.model.embed_tokens ? self.rotary_emb = self.model.model.rotary_emb ?

Oct 29 '25 12:10 laobadao