Qwen2.5
Qwen2.5 copied to clipboard
关于量化模型过程中,对qwen2-72b部分权重参数进行padded的一些问题。
@jklj077 大佬您好,我看您在 #576 中用padded的方式解决了qwen2 72b等模型awq量化后,与vllm kennel数不匹配的问题。您当时解决问题的代码如下: `import torch from torch.nn import functional as F
from transformers import AutoModelForCausalLM
must use AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-72B-Instruct", torch_dtype="auto")
this size is Qwen2-72B only
pad_size = 128
sd = model.state_dict()
for i, k in enumerate(sd): v = sd[k] print(k, i) # interleaving the padded zeros if ('mlp.up_proj.weight' in k) or ('mlp.gate_proj.weight' in k): prev_v = F.pad(v.unsqueeze(1), (0, 0, 0, 1, 0, 0)).reshape(295682, -1)[:pad_size2] new_v = torch.cat([prev_v, v[pad_size:]], dim=0) sd[k] = new_v elif 'mlp.down_proj.weight' in k: prev_v= F.pad(v.unsqueeze(2), (0, 1)).reshape(8192, 295682)[:, :pad_size2] new_v = torch.cat([prev_v, v[:, pad_size:]], dim=1) sd[k] = new_v
this is a very large file; make sure your RAM is enough to load the model
torch.save(sd, '/path/to/padded_model/pytorch_model.bin')`
我看您代码是对权重中部分参数的前128个维度,进行了交叉式的padded,想跟您学习一下,这样pad的依据或者好处是什么。可以帮忙解答一下么。