DeepSpeedExamples Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint

hi: After training the rlhf model（actor: pythia-6.9b, reward model:pythia-410M)，i evaluate the saved checkpoint by https://github.com/EleutherAI/lm-evaluation-harness，however, it seems that some weights are missing, here is the log:

Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint at /mnt/dat a/wangzhao3/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/outp ut/actor and are newly initialized: ['gpt_neox.layers.18.attention.rotary_emb.inv_freq', 'gpt neox.layers.31.attention.masked_bias', 'gpt_neox.layers.26.attention.masked_bias', 'gpt_neox-4283-b7" 11:37 10-Jul-23 .layers.22.attention.masked_bias', 'gpt_neox.layers.1.attention.masked_bias', 'gpt_neox.layer s.22.attention.rotary_emb.inv_freq', 'gpt_neox.layers.20.attention.masked_bias', 'gpt_neox.la yers.9.attention.rotary_emb.inv_freq', 'gpt_neox.layers.31.attention.bias', 'gpt_neox.layers. 0.attention.rotary_emb.inv_freq', 'gpt_neox.layers.10.attention.bias', 'gpt_neox.layers.11.at tention.masked_bias', 'gpt_neox.layers.8.attention.masked_bias', 'gpt_neox.layers.25.attentio n.rotary_emb.inv_freq', 'gpt_neox.layers.19.attention.rotary_emb.inv_freq', 'gpt_neox.layers. 21.attention.bias', 'gpt_neox.layers.5.attention.rotary_emb.inv_freq', 'gpt_neox.layers.23.at tention.bias', 'gpt_neox.layers.8.attention.bias', 'gpt_neox.layers.17.attention.bias', 'gpt neox.layers.2.attention.masked_bias', 'gpt_neox.layers.4.attention.bias', 'gpt_neox.layers.15 .attention.rotary_emb.inv_freq', 'gpt_neox.layers.4.attention.masked_bias', 'gpt_neox.layers. 12.attention.rotary_emb.inv_freq', 'gpt_neox.layers.5.attention.masked_bias', 'gpt_neox.layer s.7.attention.masked_bias', 'gpt_neox.layers.19.attention.bias', 'gpt_neox.layers.19.attentio n.masked_bias', 'gpt_neox.layers.1.attention.rotary_emb.inv_freq', 'gpt_neox.layers.17.attent ion.rotary_emb.inv_freq', 'gpt_neox.layers.14.attention.masked_bias', 'gpt_neox.layers.15.att ention.masked_bias', 'gpt_neox.layers.30.attention.rotary_emb.inv_freq', 'gpt_neox.layers.10. attention.rotary_emb.inv_freq', 'gpt_neox.layers.3.attention.rotary_emb.inv_freq', 'gpt_neox. layers.3.attention.masked_bias', 'gpt_neox.layers.20.attention.rotary_emb.inv_freq', 'gpt_neo x.layers.8.attention.rotary_emb.inv_freq', 'gpt_neox.layers.7.attention.rotary_emb.inv_freq', 'gpt_neox.layers.23.attention.rotary_emb.inv_freq', 'gpt_neox.layers.29.attention.rotary_emb .inv_freq', 'gpt_neox.layers.28.attention.rotary_emb.inv_freq', 'gpt_neox.layers.26.attention .rotary_emb.inv_freq', 'gpt_neox.layers.16.attention.masked_bias', 'gpt_neox.layers.6.attenti on.rotary_emb.inv_freq', 'gpt_neox.layers.5.attention.bias', 'gpt_neox.layers.21.attention.ro tary_emb.inv_freq', 'gpt_neox.layers.27.attention.rotary_emb.inv_freq', 'gpt_neox.layers.28.a ttention.masked_bias', 'gpt_neox.layers.20.attention.bias', 'gpt_neox.layers.2.attention.rota ry_emb.inv_freq', 'gpt_neox.layers.11.attention.rotary_emb.inv_freq', 'gpt_neox.layers.12.att ention.bias', 'gpt_neox.layers.13.attention.bias', 'gpt_neox.layers.30.attention.masked_bias' , 'gpt_neox.layers.24.attention.bias', 'gpt_neox.layers.24.attention.rotary_emb.inv_freq', 'g pt_neox.layers.24.attention.masked_bias', 'gpt_neox.layers.28.attention.bias', 'gpt_neox.laye rs.17.attention.masked_bias', 'gpt_neox.layers.18.attention.bias', 'gpt_neox.layers.9.attenti on.bias', 'gpt_neox.layers.27.attention.masked_bias', 'gpt_neox.layers.13.attention.masked_bi as', 'gpt_neox.layers.23.attention.masked_bias', 'gpt_neox.layers.16.attention.bias', 'gpt_ne ox.layers.12.attention.masked_bias', 'gpt_neox.layers.9.attention.masked_bias', 'gpt_neox.lay ers.15.attention.bias', 'gpt_neox.layers.22.attention.bias', 'gpt_neox.layers.29.attention.ma sked_bias', 'gpt_neox.layers.31.attention.rotary_emb.inv_freq', 'gpt_neox.layers.1.attention. bias', 'gpt_neox.layers.14.attention.bias', 'gpt_neox.layers.10.attention.masked_bias', 'gpt_ neox.layers.11.attention.bias', 'gpt_neox.layers.30.attention.bias', 'gpt_neox.layers.21.atte ntion.masked_bias', 'gpt_neox.layers.25.attention.masked_bias', 'gpt_neox.layers.13.attention .rotary_emb.inv_freq', 'gpt_neox.layers.4.attention.rotary_emb.inv_freq','gpt_neox.layers.18 .attention.masked_bias', 'gpt_neox.layers.27.attention.bias', 'gpt_neox.layers.6.attention.bi as', 'gpt_neox.layers.3.attention.bias', 'gpt_neox.layers.0.attention.masked_bias', 'gpt_neox .layers.0.attention.bias', 'gpt_neox.layers.25.attention.bias', 'gpt_neox.layers.2.attention. bias', 'gpt_neox.layers.16.attention.rotary_emb.inv_freq', 'gpt_neox.layers.14.attention.rota ry_emb.inv_freq', 'gpt_neox.layers.6.attention.masked_bias', 'gpt_neox.layers.29.attention.bi as', 'gpt_neox.layers.26.attention.bias', 'gpt_neox.layers.7.attention.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictio ns and inference.

the score of the rlhf model is similar to the sft model。 Can you help me, thanks.

Jul 10 '23 03:07 wangzhao88

https://github.com/EleutherAI/lm-evaluation-harness

Jul 10 '23 03:07 wangzhao88

I met exactly the same issue except that I am using llama. Did you find the correct way to load the trained rm model?

Jul 17 '23 08:07 qiancheng99

I have the same issue on loading the sft llm, I found it's because it cannot save custom modules using named_parameters. Did you solve the problem? @wangzhao88 @qiancheng99

Aug 07 '23 15:08 robotsp

same issue

Nov 08 '23 08:11 Luoxiaohei41

Same issue here. Updating transformers 4.29 -> 4.36.2 fixed it for me!

Jan 05 '24 17:01 lauraaisling

DeepSpeedExamples DeepSpeedExamples copied to clipboard

Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint

DeepSpeedExamples
DeepSpeedExamples copied to clipboard