MiniCPM [Bug]: Llama Format输出出错误

Is there an existing issue ? / 是否已有相关的 issue ?

[X] I have searched, and there is no existing issue. / 我已经搜索过了，没有相关的 issue。

Describe the bug / 描述这个 bug

我使用Llama Format model来推理，当提示词设置默认为example时（prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: ls -l"）得到的答案并不符合预期，甚至还有一些乱码。具体输出： screenshot-20240313-145614

To Reproduce / 如何复现

huggingface download "openbmb/MiniCPM-2B-dpo-bf16-llama-format" run script MiniCPM-2B （Llama Format） `import torch from transformers import LlamaTokenizerFast, LlamaForCausalLM model_path = "openbmb/MiniCPM-2B-dpo-bf16-llama-format" tokenizer = LlamaTokenizerFast.from_pretrained(model_path) model = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

prompt="Now you act like a terminal situated within a beginner's C++ practice repository folder, please provide the output for the command: ls -l" input_ids = tokenizer.encode("<用户>{}<AI>".format(prompt), return_tensors='pt', add_special_tokens=True).cuda() responds = model.generate(input_ids, temperature=0.3, top_p=0.8, repetition_penalty=1.02, max_length=1024) responds = tokenizer.decode(responds[0], skip_special_tokens=True) print(responds)`

The output is bad.

Expected behavior / 期望的结果

正确回答prompt

Screenshots / 截图

No response

Environment / 环境

- OS: ubuntu 20.04
- torch: 1.13.1+cu116
- torchvision: 0.14.1+cu116
- tokenizers: 0.15.2
- transformers: 4.36.0
- Device: A100

Additional context / 其他信息

Thanks!

Mar 13 '24 06:03 Ijustakid

I used the same example code and got the same problem. When the model is trying to load the pretrained weights, I got the following warning

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at openbmb/MiniCPM-2B-dpo-bf16-llama-format and are newly initialized: ['lm_head.weight']

So I think lm_head.weight not correctly loaded leads to the bad results

Mar 28 '24 20:03 xl1990

This problem may be caused by tie_weights for embedding and lm_head. I guess MiniCPM is reusing embedding.weight for lm_head.weight, but LlamaForCausalLM do not automatically do that.

The simplest way to fix this is running model.lm_head.weight = model.model.embed_tokens.weight (or model.lm_head.weight = torch.nn.Parameter(model.model.embed_tokens.weight.clone())) after model initialization.

Here is a blog that explains tie_weights in transformers. You may find it helpful.

Mar 28 '24 20:03 xl1990

This problem may be caused by tie_weights for embedding and lm_head. I guess MiniCPM is reusing embedding.weight for lm_head.weight, but LlamaForCausalLM do not automatically do that.

The simplest way to fix this is running model.lm_head.weight = model.model.embed_tokens.weight (or model.lm_head.weight = torch.nn.Parameter(model.model.embed_tokens.weight.clone())) after model initialization.

Here is a blog that explains tie_weights in transformers. You may find it helpful.

It's a great work for me! Thanks.

Aug 19 '24 09:08 Ijustakid