DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint

Open wangzhao88 opened this issue 1 year ago • 5 comments

hi: After training the rlhf model(actor: pythia-6.9b, reward model:pythia-410M),i evaluate the saved checkpoint by https://github.com/EleutherAI/lm-evaluation-harness,however, it seems that some weights are missing, here is the log:

Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint at /mnt/dat a/wangzhao3/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/outp ut/actor and are newly initialized: ['gpt_neox.layers.18.attention.rotary_emb.inv_freq', 'gpt neox.layers.31.attention.masked_bias', 'gpt_neox.layers.26.attention.masked_bias', 'gpt_neox-4283-b7" 11:37 10-Jul-23 .layers.22.attention.masked_bias', 'gpt_neox.layers.1.attention.masked_bias', 'gpt_neox.layer s.22.attention.rotary_emb.inv_freq', 'gpt_neox.layers.20.attention.masked_bias', 'gpt_neox.la yers.9.attention.rotary_emb.inv_freq', 'gpt_neox.layers.31.attention.bias', 'gpt_neox.layers. 0.attention.rotary_emb.inv_freq', 'gpt_neox.layers.10.attention.bias', 'gpt_neox.layers.11.at tention.masked_bias', 'gpt_neox.layers.8.attention.masked_bias', 'gpt_neox.layers.25.attentio n.rotary_emb.inv_freq', 'gpt_neox.layers.19.attention.rotary_emb.inv_freq', 'gpt_neox.layers. 21.attention.bias', 'gpt_neox.layers.5.attention.rotary_emb.inv_freq', 'gpt_neox.layers.23.at tention.bias', 'gpt_neox.layers.8.attention.bias', 'gpt_neox.layers.17.attention.bias', 'gpt neox.layers.2.attention.masked_bias', 'gpt_neox.layers.4.attention.bias', 'gpt_neox.layers.15 .attention.rotary_emb.inv_freq', 'gpt_neox.layers.4.attention.masked_bias', 'gpt_neox.layers. 12.attention.rotary_emb.inv_freq', 'gpt_neox.layers.5.attention.masked_bias', 'gpt_neox.layer s.7.attention.masked_bias', 'gpt_neox.layers.19.attention.bias', 'gpt_neox.layers.19.attentio n.masked_bias', 'gpt_neox.layers.1.attention.rotary_emb.inv_freq', 'gpt_neox.layers.17.attent ion.rotary_emb.inv_freq', 'gpt_neox.layers.14.attention.masked_bias', 'gpt_neox.layers.15.att ention.masked_bias', 'gpt_neox.layers.30.attention.rotary_emb.inv_freq', 'gpt_neox.layers.10. attention.rotary_emb.inv_freq', 'gpt_neox.layers.3.attention.rotary_emb.inv_freq', 'gpt_neox. layers.3.attention.masked_bias', 'gpt_neox.layers.20.attention.rotary_emb.inv_freq', 'gpt_neo x.layers.8.attention.rotary_emb.inv_freq', 'gpt_neox.layers.7.attention.rotary_emb.inv_freq', 'gpt_neox.layers.23.attention.rotary_emb.inv_freq', 'gpt_neox.layers.29.attention.rotary_emb .inv_freq', 'gpt_neox.layers.28.attention.rotary_emb.inv_freq', 'gpt_neox.layers.26.attention .rotary_emb.inv_freq', 'gpt_neox.layers.16.attention.masked_bias', 'gpt_neox.layers.6.attenti on.rotary_emb.inv_freq', 'gpt_neox.layers.5.attention.bias', 'gpt_neox.layers.21.attention.ro tary_emb.inv_freq', 'gpt_neox.layers.27.attention.rotary_emb.inv_freq', 'gpt_neox.layers.28.a ttention.masked_bias', 'gpt_neox.layers.20.attention.bias', 'gpt_neox.layers.2.attention.rota ry_emb.inv_freq', 'gpt_neox.layers.11.attention.rotary_emb.inv_freq', 'gpt_neox.layers.12.att ention.bias', 'gpt_neox.layers.13.attention.bias', 'gpt_neox.layers.30.attention.masked_bias' , 'gpt_neox.layers.24.attention.bias', 'gpt_neox.layers.24.attention.rotary_emb.inv_freq', 'g pt_neox.layers.24.attention.masked_bias', 'gpt_neox.layers.28.attention.bias', 'gpt_neox.laye rs.17.attention.masked_bias', 'gpt_neox.layers.18.attention.bias', 'gpt_neox.layers.9.attenti on.bias', 'gpt_neox.layers.27.attention.masked_bias', 'gpt_neox.layers.13.attention.masked_bi as', 'gpt_neox.layers.23.attention.masked_bias', 'gpt_neox.layers.16.attention.bias', 'gpt_ne ox.layers.12.attention.masked_bias', 'gpt_neox.layers.9.attention.masked_bias', 'gpt_neox.lay ers.15.attention.bias', 'gpt_neox.layers.22.attention.bias', 'gpt_neox.layers.29.attention.ma sked_bias', 'gpt_neox.layers.31.attention.rotary_emb.inv_freq', 'gpt_neox.layers.1.attention. bias', 'gpt_neox.layers.14.attention.bias', 'gpt_neox.layers.10.attention.masked_bias', 'gpt_ neox.layers.11.attention.bias', 'gpt_neox.layers.30.attention.bias', 'gpt_neox.layers.21.atte ntion.masked_bias', 'gpt_neox.layers.25.attention.masked_bias', 'gpt_neox.layers.13.attention .rotary_emb.inv_freq', 'gpt_neox.layers.4.attention.rotary_emb.inv_freq','gpt_neox.layers.18 .attention.masked_bias', 'gpt_neox.layers.27.attention.bias', 'gpt_neox.layers.6.attention.bi as', 'gpt_neox.layers.3.attention.bias', 'gpt_neox.layers.0.attention.masked_bias', 'gpt_neox .layers.0.attention.bias', 'gpt_neox.layers.25.attention.bias', 'gpt_neox.layers.2.attention. bias', 'gpt_neox.layers.16.attention.rotary_emb.inv_freq', 'gpt_neox.layers.14.attention.rota ry_emb.inv_freq', 'gpt_neox.layers.6.attention.masked_bias', 'gpt_neox.layers.29.attention.bi as', 'gpt_neox.layers.26.attention.bias', 'gpt_neox.layers.7.attention.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictio ns and inference.

the score of the rlhf model is similar to the sft model。 Can you help me, thanks.

wangzhao88 avatar Jul 10 '23 03:07 wangzhao88

https://github.com/EleutherAI/lm-evaluation-harness

wangzhao88 avatar Jul 10 '23 03:07 wangzhao88

I met exactly the same issue except that I am using llama. Did you find the correct way to load the trained rm model?

qiancheng99 avatar Jul 17 '23 08:07 qiancheng99

I have the same issue on loading the sft llm, I found it's because it cannot save custom modules using named_parameters. Did you solve the problem? @wangzhao88 @qiancheng99

robotsp avatar Aug 07 '23 15:08 robotsp

same issue

Luoxiaohei41 avatar Nov 08 '23 08:11 Luoxiaohei41

Same issue here. Updating transformers 4.29 -> 4.36.2 fixed it for me!

lauraaisling avatar Jan 05 '24 17:01 lauraaisling