ChatGLM-6B
ChatGLM-6B copied to clipboard
[BUG/Help] <在mac上运行chatglm-6b-int4 报错:copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096])>
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
ize mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096]). size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 2048]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 8192]) from checkpoint, the shape in current model is torch.Size([4096, 16384]). size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([12288, 2048]) from checkpoint, the shape in current model is torch.Size([12288, 4096]). size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([4096, 2048]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([16384, 2048]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([4096, 8192]) from checkpoint, the shape in current model is torch.Size([4096, 16384]).
Expected Behavior
No response
Steps To Reproduce
- clone 了 chatglm-6b-int4 项目
- 修改代码 modeling_chatglm.py
# from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel# self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs) - 修改代码 cli_demo.py `tokenizer = AutoTokenizer.from_pretrained("../chatglm-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("../chatglm-6b-int4", trust_remote_code=True).float()
#tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True) #model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()`
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
Anything else?
No response
我也遇到同样的问题 Windows11 Python3.9 torch2.0.0 or torch1.13.1 都有问题 Transformers (compiled from latest git source) CPU:AMD R7-6800H
chatglm-6b-int4是量化后的权重,必须使用量化后的模型结构加载
# from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel
# self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)
```
注释掉这段代码之后,模型结构依然是量化之前的,因此会发生加载错误。可以尝试加载chatglm-6b的权重