ChatGLM-6B [Help] <size mismatch for embedding.weight>copying a param with shape torch.Size([8, 229376]) from checkpoint, the shape in current model is torch.Size([8, 4096]).

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

求教！！！！int4模型微调以后跑模型，提示报错 Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /home/luban/chatglm-6b-int4 and are newly initialized: ['transformer.prefix_encoder.trans.0.weight', 'transformer.prefix_encoder.trans.2.weight', 'transformer.prefix_encoder.trans.2.bias', 'transformer.prefix_encoder.embedding.weight', 'transformer.prefix_encoder.trans.0.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/luban/ChatGLM-6B/cli_demo.py", line 18, in model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict) File "/home/luban/miniconda3/envs/chatglm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PrefixEncoder: Missing key(s) in state_dict: "trans.0.weight", "trans.0.bias", "trans.2.weight", "trans.2.bias". size mismatch for embedding.weight: copying a param with shape torch.Size([8, 229376]) from checkpoint, the shape in current model is torch.Size([8, 4096]).

Expected Behavior

No response

Steps To Reproduce

微调以后跑模型

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

May 17 '23 14:05 MrWuzy1994

我的理解是推理时的有些参数（如source_prefix 和prefix_projection）需要和训练时保持一致 config = AutoConfig.from_pretrained(path, trust_remote_code=True, pre_seq_len=128, source_prefix='如果你有prefix 就把它加上吧', prefix_projection=True)

May 20 '23 09:05 teanon

我的理解是推理时的有些参数（如source_prefix 和prefix_projection）需要和训练时保持一致 config = AutoConfig.from_pretrained(path, trust_remote_code=True, pre_seq_len=128, source_prefix='如果你有prefix 就把它加上吧', prefix_projection=True)

加上了还是不行啊我指定了 prefix_projection 和ptuning checkpoint

Jun 01 '23 03:06 CyanMystery

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True, pre_seq_len=576) 这里面的pre_seq_len要和训练模型时的参数一致，试一下

Jun 02 '23 02:06 starevelyn