bug on awq v2

Open ChenBinfighting1 opened this issue 5 months ago • 0 comments

1，现状：目前使用deepseek-r1进行awq量化，采用clip_version: v2报错 2，awq_w_only.yml参数如下： base: seed: &seed 42 model: type: DeepseekV3 path: /mnt/DeepSeek-R1 tokenizer_mode: slow torch_dtype: auto calib: name: pileval download: False path: /home/llmc/data/pileval n_samples: 128 bs: -1 seq_len: 512 preproc: pileval_awq seed: *seed eval: eval_pos: [pretrain, transformed, fake_quant] name: wikitext2 download: False path: /home/llmc/data/wikitext2 seq_len: 2048 # 2048 # For 7B / 13B model eval, bs can be set to "1", and inference_per_block can be set to "False". # For 70B model eval, bs can be set to "20", and inference_per_block can be set to "True". bs: 20 inference_per_block: True quant: method: Awq weight: bit: 4 symmetric: True granularity: per_group group_size: 128 calib_algo: learnable special: trans: True # The options for "trans_version" include "v1" and "v2". # But their results don't differ significantly. trans_version: v2 weight_clip: True clip_version: v2 # For 2-bit quantization, setting "clip_sym: False" will yield better results. clip_sym: True save_scale: True scale_path: /home/llmc/scale_data save_clip: True clip_path: /home/llmc/clip_data save: save_trans: False save_fake: False save_path: /home/llmc/deepseek_quat

3，报错 [rank0]: File "/home/llmc/llmc/compression/quantization/base_blockwise_quantization.py", line 453, in run [rank0]: self.block_transform(block, input_feat, self.input['kwargs']) [rank0]: File "/home/wangke/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/home/llmc/llmc/compression/quantization/awq.py", line 294, in block_transform [rank0]: self.auto_clipper.run( [rank0]: File "/home/wangke/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 68, in run [rank0]: max_val, min_val = self.auto_clip_layer( [rank0]: File "/home/miniconda3/envs/llmc/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 161, in auto_clip_layer [rank0]: q_w = self.fake_quantize_weight( [rank0]: File "/home/llmc/llmc/compression/quantization/auto_clip.py", line 271, in fake_quantize_weight [rank0]: q_w = self.wquantizer.fake_quant_weight_static(w, args) [rank0]: File "/home/llmc/llmc/compression/quantization/quant.py", line 814, in fake_quant_weight_static [rank0]: q_weight = self.quant_dequant( [rank0]: File "/home/llmc/llmc/compression/quantization/quant.py", line 715, in quant_dequant [rank0]: tensor = self.quant(tensor, scales, zeros, qmax, qmin) [rank0]: File "/home/llmc/llmc/compression/quantization/quant.py", line 701, in quant [rank0]: tensor = torch.clamp(self.round_func(tensor / scales) + zeros, qmin, qmax) [rank0]: RuntimeError: The size of tensor a (3584) must match the size of tensor b (56) at non-singleton dimension 2

4，解决方案调试发现其中的tensor和scales 维度问题。quant.py中的方法fake_quant_weight_static增加scales.reshape(-1, 1)就可以。是否可以提交pr

Aug 04 '25 07:08 ChenBinfighting1