DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint
hi: After training the rlhf model(actor: pythia-6.9b, reward model:pythia-410M),i evaluate the saved checkpoint by https://github.com/EleutherAI/lm-evaluation-harness,however, it seems that some weights are missing, here is the log:
Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint at /mnt/dat
a/wangzhao3/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/outp
ut/actor and are newly initialized: ['gpt_neox.layers.18.attention.rotary_emb.inv_freq', 'gpt
neox.layers.31.attention.masked_bias', 'gpt_neox.layers.26.attention.masked_bias', 'gpt_neox-4283-b7" 11:37 10-Jul-23
.layers.22.attention.masked_bias', 'gpt_neox.layers.1.attention.masked_bias', 'gpt_neox.layer
s.22.attention.rotary_emb.inv_freq', 'gpt_neox.layers.20.attention.masked_bias', 'gpt_neox.la
yers.9.attention.rotary_emb.inv_freq', 'gpt_neox.layers.31.attention.bias', 'gpt_neox.layers.
0.attention.rotary_emb.inv_freq', 'gpt_neox.layers.10.attention.bias', 'gpt_neox.layers.11.at
tention.masked_bias', 'gpt_neox.layers.8.attention.masked_bias', 'gpt_neox.layers.25.attentio
n.rotary_emb.inv_freq', 'gpt_neox.layers.19.attention.rotary_emb.inv_freq', 'gpt_neox.layers.
21.attention.bias', 'gpt_neox.layers.5.attention.rotary_emb.inv_freq', 'gpt_neox.layers.23.at
tention.bias', 'gpt_neox.layers.8.attention.bias', 'gpt_neox.layers.17.attention.bias', 'gpt
neox.layers.2.attention.masked_bias', 'gpt_neox.layers.4.attention.bias', 'gpt_neox.layers.15
.attention.rotary_emb.inv_freq', 'gpt_neox.layers.4.attention.masked_bias', 'gpt_neox.layers.
12.attention.rotary_emb.inv_freq', 'gpt_neox.layers.5.attention.masked_bias', 'gpt_neox.layer
s.7.attention.masked_bias', 'gpt_neox.layers.19.attention.bias', 'gpt_neox.layers.19.attentio
n.masked_bias', 'gpt_neox.layers.1.attention.rotary_emb.inv_freq', 'gpt_neox.layers.17.attent
ion.rotary_emb.inv_freq', 'gpt_neox.layers.14.attention.masked_bias', 'gpt_neox.layers.15.att
ention.masked_bias', 'gpt_neox.layers.30.attention.rotary_emb.inv_freq', 'gpt_neox.layers.10.
attention.rotary_emb.inv_freq', 'gpt_neox.layers.3.attention.rotary_emb.inv_freq', 'gpt_neox.
layers.3.attention.masked_bias', 'gpt_neox.layers.20.attention.rotary_emb.inv_freq', 'gpt_neo
x.layers.8.attention.rotary_emb.inv_freq', 'gpt_neox.layers.7.attention.rotary_emb.inv_freq',
'gpt_neox.layers.23.attention.rotary_emb.inv_freq', 'gpt_neox.layers.29.attention.rotary_emb
.inv_freq', 'gpt_neox.layers.28.attention.rotary_emb.inv_freq', 'gpt_neox.layers.26.attention
.rotary_emb.inv_freq', 'gpt_neox.layers.16.attention.masked_bias', 'gpt_neox.layers.6.attenti
on.rotary_emb.inv_freq', 'gpt_neox.layers.5.attention.bias', 'gpt_neox.layers.21.attention.ro
tary_emb.inv_freq', 'gpt_neox.layers.27.attention.rotary_emb.inv_freq', 'gpt_neox.layers.28.a
ttention.masked_bias', 'gpt_neox.layers.20.attention.bias', 'gpt_neox.layers.2.attention.rota
ry_emb.inv_freq', 'gpt_neox.layers.11.attention.rotary_emb.inv_freq', 'gpt_neox.layers.12.att
ention.bias', 'gpt_neox.layers.13.attention.bias', 'gpt_neox.layers.30.attention.masked_bias'
, 'gpt_neox.layers.24.attention.bias', 'gpt_neox.layers.24.attention.rotary_emb.inv_freq', 'g
pt_neox.layers.24.attention.masked_bias', 'gpt_neox.layers.28.attention.bias', 'gpt_neox.laye
rs.17.attention.masked_bias', 'gpt_neox.layers.18.attention.bias', 'gpt_neox.layers.9.attenti
on.bias', 'gpt_neox.layers.27.attention.masked_bias', 'gpt_neox.layers.13.attention.masked_bi
as', 'gpt_neox.layers.23.attention.masked_bias', 'gpt_neox.layers.16.attention.bias', 'gpt_ne
ox.layers.12.attention.masked_bias', 'gpt_neox.layers.9.attention.masked_bias', 'gpt_neox.lay
ers.15.attention.bias', 'gpt_neox.layers.22.attention.bias', 'gpt_neox.layers.29.attention.ma
sked_bias', 'gpt_neox.layers.31.attention.rotary_emb.inv_freq', 'gpt_neox.layers.1.attention.
bias', 'gpt_neox.layers.14.attention.bias', 'gpt_neox.layers.10.attention.masked_bias', 'gpt_
neox.layers.11.attention.bias', 'gpt_neox.layers.30.attention.bias', 'gpt_neox.layers.21.atte
ntion.masked_bias', 'gpt_neox.layers.25.attention.masked_bias', 'gpt_neox.layers.13.attention
.rotary_emb.inv_freq', 'gpt_neox.layers.4.attention.rotary_emb.inv_freq','gpt_neox.layers.18
.attention.masked_bias', 'gpt_neox.layers.27.attention.bias', 'gpt_neox.layers.6.attention.bi
as', 'gpt_neox.layers.3.attention.bias', 'gpt_neox.layers.0.attention.masked_bias', 'gpt_neox
.layers.0.attention.bias', 'gpt_neox.layers.25.attention.bias', 'gpt_neox.layers.2.attention.
bias', 'gpt_neox.layers.16.attention.rotary_emb.inv_freq', 'gpt_neox.layers.14.attention.rota
ry_emb.inv_freq', 'gpt_neox.layers.6.attention.masked_bias', 'gpt_neox.layers.29.attention.bi
as', 'gpt_neox.layers.26.attention.bias', 'gpt_neox.layers.7.attention.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictio
ns and inference.
the score of the rlhf model is similar to the sft model。 Can you help me, thanks.
https://github.com/EleutherAI/lm-evaluation-harness
I met exactly the same issue except that I am using llama. Did you find the correct way to load the trained rm model?
I have the same issue on loading the sft llm, I found it's because it cannot save custom modules using named_parameters. Did you solve the problem? @wangzhao88 @qiancheng99
same issue
Same issue here. Updating transformers 4.29 -> 4.36.2 fixed it for me!