GLM-130B 部署后报错 size mismatch for transformer.word_embeddings.weight: copying a param with shape torch.Size([18816, 12288]) from checkpoint, the shape in current model is torch.Size([150528, 12288]).

错误信息如下：这是为什么？没改参数

(py39) [root@iZt4neiya2cung434kemk2Z GLM-130B]# bash scripts/generate.sh --input-source interactive WARNING: No training data specified using world size: 1 and model-parallel size: 8

padded vocab (size: 150528) with 0 dummy tokens (new size: 150528) initializing model parallel with size 8 Set tokenizer as a icetk-glm-130B tokenizer! Now you can get_tokenizer() everywhere. global rank 0 is loading checkpoint /root/130b/glm-130b-sat/49300/mp_rank_00_model_states.pt Traceback (most recent call last): File "/root/GLM-130B/generate.py", line 215, in main(args) File "/root/GLM-130B/generate.py", line 160, in main model, tokenizer = initialize_model_and_tokenizer(args) File "/root/GLM-130B/initialize.py", line 72, in initialize_model_and_tokenizer load_checkpoint(model, args) File "/root/.conda/envs/py39/lib/python3.9/site-packages/SwissArmyTransformer/training/model_io.py", line 181, in load_checkpoint missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False) File "/root/.conda/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for GLM130B: size mismatch for transformer.word_embeddings.weight: copying a param with shape torch.Size([18816, 12288]) from checkpoint, the shape in current model is torch.Size([150528, 12288]). size mismatch for transformer.layers.0.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.0.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.0.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.0.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.0.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.1.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.1.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.1.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.1.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.1.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.2.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.2.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.2.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.2.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.2.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.2.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.3.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.3.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.3.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.3.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.3.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.3.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.4.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.4.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.4.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.4.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.4.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.4.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.5.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.5.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.5.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.5.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.5.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.5.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.6.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.6.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.6.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.6.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.6.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.6.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.7.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.7.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.7.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.7.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.7.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.7.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.8.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.8.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.8.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.8.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.8.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.8.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.9.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.9.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.9.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.9.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.9.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.9.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.10.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.10.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.10.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.10.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.10.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.10.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.11.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.11.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.11.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.11.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.11.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.11.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.12.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.12.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.12.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.12.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.12.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.12.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.13.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.13.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.13.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.13.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.13.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.13.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.14.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.14.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.14.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.14.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.14.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.14.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.15.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.15.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]). size mismatch for transformer.layers.15.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 12288]). size mismatch for transformer.layers.15.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 32768]). size mismatch for transformer.layers.15.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([65536, 12288]). size mismatch for transformer.layers.15.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([65536]). size mismatch for transformer.layers.16.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([36864, 12288]). size mismatch for transformer.layers.16.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([36864]).

Jun 04 '23 15:06 yihuaxiang

same problem

Jun 06 '23 10:06 micelvrice

很明显的错误：using world size: 1 and model-parallel size: 8 你加载权重是1/8的权重，你实际定义模型是完整的维度，自然加载不了：size mismatch for transformer.word_embeddings.weight: copying a param with shape torch.Size([18816, 12288]) from checkpoint, the shape in current model is torch.Size([150528, 12288]).只加载了1/8的权重

Jun 06 '23 12:06 chenboheng

感谢

Jun 06 '23 15:06 yihuaxiang

你好，我的是：using world size: 4 and model-parallel size: 4

padded vocab (size: 150528) with 0 dummy tokens (new size: 150528) initializing model parallel with size 4 Set tokenizer as a icetk-glm-130B tokenizer! Now you can get_tokenizer() everywhere. global rank 3 is loading checkpoint /chatglm-130b/tar/glm-130b-sat/49300/mp_rank_03_model_states.pt global rank 1 is loading checkpoint /chatglm-130b/tar/glm-130b-sat/49300/mp_rank_01_model_states.pt global rank 0 is loading checkpoint /chatglm-130b/tar/glm-130b-sat/49300/mp_rank_00_model_states.pt global rank 2 is loading checkpoint /chatglm-130b/tar/glm-130b-sat/49300/mp_rank_02_model_states.pt

但是也报了类似的错误：size mismatch for transformer.layers.68.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 8192]). size mismatch for transformer.layers.68.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([16384, 12288]). size mismatch for transformer.layers.68.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([16384]). size mismatch for transformer.layers.69.attention.query_key_value.weight: copying a param with shape torch.Size([4608, 12288]) from checkpoint, the shape in current model is torch.Size([9216, 12288]). size mismatch for transformer.layers.69.attention.query_key_value.bias: copying a param with shape torch.Size([4608]) from checkpoint, the shape in current model is torch.Size([9216]). size mismatch for transformer.layers.69.attention.dense.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 3072]). size mismatch for transformer.layers.69.mlp.dense_4h_to_h.weight: copying a param with shape torch.Size([12288, 4096]) from checkpoint, the shape in current model is torch.Size([12288, 8192]). size mismatch for transformer.layers.69.mlp.dense_h_to_4h.weight: copying a param with shape torch.Size([8192, 12288]) from checkpoint, the shape in current model is torch.Size([16384, 12288]). size mismatch for transformer.layers.69.mlp.dense_h_to_4h.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([16384]).

这是为什么？谢谢解答。

Jun 27 '23 16:06 Nann4n

这个问题在docs/quantization.md中有说明，我们解压得到了模型文件是适应8GPU的，如果你需要在更少数量的GPU上运行，比如4块GPU，需要使用命令：python tools/convert_tp.py --input-folder <SRC_CKPT_PATH> --output-folder <DST_CKPT_PATH> --target-tp 4；其中<SRC_CKPT_PATH>是解压后模型的路径， <DST_CKPT_PATH>是转换参数文件后输出的路径

Jul 13 '23 07:07 newbby123

GLM-130B GLM-130B copied to clipboard

部署后报错 size mismatch for transformer.word_embeddings.weight: copying a param with shape torch.Size([18816, 12288]) from checkpoint, the shape in current model is torch.Size([150528, 12288]).

GLM-130B
GLM-130B copied to clipboard