Chinese-LLaMA-Alpaca icon indicating copy to clipboard operation
Chinese-LLaMA-Alpaca copied to clipboard

模型第二阶段预训练问题

Open longeryeah opened this issue 2 years ago • 6 comments

你好,我使用run_clm_pt_with_peft.py代码进行第二阶段预训练,同样在A100(40G)的机器运行跑,512的长度,使用你们提供的预训练脚本的参数,其中trainable params: 183828480 || all params: 6660100096 || trainable%: 2.760145903969309,训练参数只有2.76% ,比你论文里面的6.06%差距很大,同时我用3张卡,batch_size最大只能开到2,和你们1024的差距也太大了,有什么可能的原因吗

longeryeah avatar May 11 '23 11:05 longeryeah

你可以在trainer.train()之前print一下列出模型中所有可训练(requires_grad)的参数的名称,看看embed_tokens和lm_head是否在其中? 另外是否使用了指定版本的peft?

可以通过开梯度累积(gradient_accumulation_steps)提升batch size,比如你单卡batch size 是2,那么总batch size就是 2*3*gradient_accumulation_steps

airaria avatar May 11 '23 11:05 airaria

1.从打印出来的参数可以看到base_model.model.model.embed_tokens.weight,base_model.model.lm_head.weight,应该是有对应的参数。2.peft是通过你们的git对应的版本链接,下载,通过python setup.py install的,3.所以你们展示的batch_size是通过gradient_accumulation_steps累计到的数量是吗,因为我怕我这边哪里配置有问题导致的

longeryeah avatar May 11 '23 13:05 longeryeah

  1. 从打印出来的参数看如果都正常,那就应该没问题,你可以打印的所有的可训练参数名贴出来我看一下。
  2. 是的,是total train batch size

airaria avatar May 11 '23 14:05 airaria

下面是可训练的参数名 base_model.model.model.embed_tokens.weight base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight base_model.model.model.layers.0.self_attn.k_proj.lora_B.weight base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight base_model.model.model.layers.0.self_attn.v_proj.lora_B.weight base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight base_model.model.model.layers.0.self_attn.o_proj.lora_B.weight base_model.model.model.layers.0.mlp.gate_proj.lora_A.weight base_model.model.model.layers.0.mlp.gate_proj.lora_B.weight base_model.model.model.layers.0.mlp.down_proj.lora_A.weight base_model.model.model.layers.0.mlp.down_proj.lora_B.weight base_model.model.model.layers.0.mlp.up_proj.lora_A.weight base_model.model.model.layers.0.mlp.up_proj.lora_B.weight base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight base_model.model.model.layers.1.self_attn.q_proj.lora_B.weight base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight base_model.model.model.layers.1.self_attn.k_proj.lora_B.weight base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight base_model.model.model.layers.1.self_attn.v_proj.lora_B.weight base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight base_model.model.model.layers.1.self_attn.o_proj.lora_B.weight base_model.model.model.layers.1.mlp.gate_proj.lora_A.weight base_model.model.model.layers.1.mlp.gate_proj.lora_B.weight base_model.model.model.layers.1.mlp.down_proj.lora_A.weight base_model.model.model.layers.1.mlp.down_proj.lora_B.weight base_model.model.model.layers.1.mlp.up_proj.lora_A.weight base_model.model.model.layers.1.mlp.up_proj.lora_B.weight base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight base_model.model.model.layers.2.self_attn.q_proj.lora_B.weight base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight base_model.model.model.layers.2.self_attn.k_proj.lora_B.weight base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight base_model.model.model.layers.2.self_attn.v_proj.lora_B.weight base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight base_model.model.model.layers.2.self_attn.o_proj.lora_B.weight base_model.model.model.layers.2.mlp.gate_proj.lora_A.weight base_model.model.model.layers.2.mlp.gate_proj.lora_B.weight base_model.model.model.layers.2.mlp.down_proj.lora_A.weight base_model.model.model.layers.2.mlp.down_proj.lora_B.weight base_model.model.model.layers.2.mlp.up_proj.lora_A.weight base_model.model.model.layers.2.mlp.up_proj.lora_B.weight base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight base_model.model.model.layers.3.self_attn.q_proj.lora_B.weight base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight base_model.model.model.layers.3.self_attn.k_proj.lora_B.weight base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight base_model.model.model.layers.3.self_attn.v_proj.lora_B.weight base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight base_model.model.model.layers.3.self_attn.o_proj.lora_B.weight base_model.model.model.layers.3.mlp.gate_proj.lora_A.weight base_model.model.model.layers.3.mlp.gate_proj.lora_B.weight base_model.model.model.layers.3.mlp.down_proj.lora_A.weight base_model.model.model.layers.3.mlp.down_proj.lora_B.weight base_model.model.model.layers.3.mlp.up_proj.lora_A.weight base_model.model.model.layers.3.mlp.up_proj.lora_B.weight base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight base_model.model.model.layers.4.self_attn.q_proj.lora_B.weight base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight base_model.model.model.layers.4.self_attn.k_proj.lora_B.weight base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight base_model.model.model.layers.4.self_attn.v_proj.lora_B.weight base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight base_model.model.model.layers.4.self_attn.o_proj.lora_B.weight base_model.model.model.layers.4.mlp.gate_proj.lora_A.weight base_model.model.model.layers.4.mlp.gate_proj.lora_B.weight base_model.model.model.layers.4.mlp.down_proj.lora_A.weight base_model.model.model.layers.4.mlp.down_proj.lora_B.weight base_model.model.model.layers.4.mlp.up_proj.lora_A.weight base_model.model.model.layers.4.mlp.up_proj.lora_B.weight base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight base_model.model.model.layers.5.self_attn.q_proj.lora_B.weight base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight base_model.model.model.layers.5.self_attn.k_proj.lora_B.weight base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight base_model.model.model.layers.5.self_attn.v_proj.lora_B.weight base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight base_model.model.model.layers.5.self_attn.o_proj.lora_B.weight base_model.model.model.layers.5.mlp.gate_proj.lora_A.weight base_model.model.model.layers.5.mlp.gate_proj.lora_B.weight base_model.model.model.layers.5.mlp.down_proj.lora_A.weight base_model.model.model.layers.5.mlp.down_proj.lora_B.weight base_model.model.model.layers.5.mlp.up_proj.lora_A.weight base_model.model.model.layers.5.mlp.up_proj.lora_B.weight base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight base_model.model.model.layers.6.self_attn.q_proj.lora_B.weight base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight base_model.model.model.layers.6.self_attn.k_proj.lora_B.weight base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight base_model.model.model.layers.6.self_attn.v_proj.lora_B.weight base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight base_model.model.model.layers.6.self_attn.o_proj.lora_B.weight base_model.model.model.layers.6.mlp.gate_proj.lora_A.weight base_model.model.model.layers.6.mlp.gate_proj.lora_B.weight base_model.model.model.layers.6.mlp.down_proj.lora_A.weight base_model.model.model.layers.6.mlp.down_proj.lora_B.weight base_model.model.model.layers.6.mlp.up_proj.lora_A.weight base_model.model.model.layers.6.mlp.up_proj.lora_B.weight base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight base_model.model.model.layers.7.self_attn.q_proj.lora_B.weight base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight base_model.model.model.layers.7.self_attn.k_proj.lora_B.weight base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight base_model.model.model.layers.7.self_attn.v_proj.lora_B.weight base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight base_model.model.model.layers.7.self_attn.o_proj.lora_B.weight base_model.model.model.layers.7.mlp.gate_proj.lora_A.weight base_model.model.model.layers.7.mlp.gate_proj.lora_B.weight base_model.model.model.layers.7.mlp.down_proj.lora_A.weight base_model.model.model.layers.7.mlp.down_proj.lora_B.weight base_model.model.model.layers.7.mlp.up_proj.lora_A.weight base_model.model.model.layers.7.mlp.up_proj.lora_B.weight base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight base_model.model.model.layers.8.self_attn.q_proj.lora_B.weight base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight base_model.model.model.layers.8.self_attn.k_proj.lora_B.weight base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight base_model.model.model.layers.8.self_attn.v_proj.lora_B.weight base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight base_model.model.model.layers.8.self_attn.o_proj.lora_B.weight base_model.model.model.layers.8.mlp.gate_proj.lora_A.weight base_model.model.model.layers.8.mlp.gate_proj.lora_B.weight base_model.model.model.layers.8.mlp.down_proj.lora_A.weight base_model.model.model.layers.8.mlp.down_proj.lora_B.weight base_model.model.model.layers.8.mlp.up_proj.lora_A.weight base_model.model.model.layers.8.mlp.up_proj.lora_B.weight base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight base_model.model.model.layers.9.self_attn.q_proj.lora_B.weight base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight base_model.model.model.layers.9.self_attn.k_proj.lora_B.weight base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight base_model.model.model.layers.9.self_attn.v_proj.lora_B.weight base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight base_model.model.model.layers.9.self_attn.o_proj.lora_B.weight base_model.model.model.layers.9.mlp.gate_proj.lora_A.weight base_model.model.model.layers.9.mlp.gate_proj.lora_B.weight base_model.model.model.layers.9.mlp.down_proj.lora_A.weight base_model.model.model.layers.9.mlp.down_proj.lora_B.weight base_model.model.model.layers.9.mlp.up_proj.lora_A.weight base_model.model.model.layers.9.mlp.up_proj.lora_B.weight base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight base_model.model.model.layers.10.self_attn.q_proj.lora_B.weight base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight base_model.model.model.layers.10.self_attn.k_proj.lora_B.weight base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight base_model.model.model.layers.10.self_attn.v_proj.lora_B.weight base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight base_model.model.model.layers.10.self_attn.o_proj.lora_B.weight base_model.model.model.layers.10.mlp.gate_proj.lora_A.weight base_model.model.model.layers.10.mlp.gate_proj.lora_B.weight base_model.model.model.layers.10.mlp.down_proj.lora_A.weight base_model.model.model.layers.10.mlp.down_proj.lora_B.weight base_model.model.model.layers.10.mlp.up_proj.lora_A.weight base_model.model.model.layers.10.mlp.up_proj.lora_B.weight base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight base_model.model.model.layers.11.self_attn.q_proj.lora_B.weight base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight base_model.model.model.layers.11.self_attn.k_proj.lora_B.weight base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight base_model.model.model.layers.11.self_attn.v_proj.lora_B.weight base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight base_model.model.model.layers.11.self_attn.o_proj.lora_B.weight base_model.model.model.layers.11.mlp.gate_proj.lora_A.weight base_model.model.model.layers.11.mlp.gate_proj.lora_B.weight base_model.model.model.layers.11.mlp.down_proj.lora_A.weight base_model.model.model.layers.11.mlp.down_proj.lora_B.weight base_model.model.model.layers.11.mlp.up_proj.lora_A.weight base_model.model.model.layers.11.mlp.up_proj.lora_B.weight base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight base_model.model.model.layers.12.self_attn.q_proj.lora_B.weight base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight base_model.model.model.layers.12.self_attn.k_proj.lora_B.weight base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight base_model.model.model.layers.12.self_attn.v_proj.lora_B.weight base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight base_model.model.model.layers.12.self_attn.o_proj.lora_B.weight base_model.model.model.layers.12.mlp.gate_proj.lora_A.weight base_model.model.model.layers.12.mlp.gate_proj.lora_B.weight base_model.model.model.layers.12.mlp.down_proj.lora_A.weight base_model.model.model.layers.12.mlp.down_proj.lora_B.weight base_model.model.model.layers.12.mlp.up_proj.lora_A.weight base_model.model.model.layers.12.mlp.up_proj.lora_B.weight base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight base_model.model.model.layers.13.self_attn.q_proj.lora_B.weight base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight base_model.model.model.layers.13.self_attn.k_proj.lora_B.weight base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight base_model.model.model.layers.13.self_attn.v_proj.lora_B.weight base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight base_model.model.model.layers.13.self_attn.o_proj.lora_B.weight base_model.model.model.layers.13.mlp.gate_proj.lora_A.weight base_model.model.model.layers.13.mlp.gate_proj.lora_B.weight base_model.model.model.layers.13.mlp.down_proj.lora_A.weight base_model.model.model.layers.13.mlp.down_proj.lora_B.weight base_model.model.model.layers.13.mlp.up_proj.lora_A.weight base_model.model.model.layers.13.mlp.up_proj.lora_B.weight base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight base_model.model.model.layers.14.self_attn.q_proj.lora_B.weight base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight base_model.model.model.layers.14.self_attn.k_proj.lora_B.weight base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight base_model.model.model.layers.14.self_attn.v_proj.lora_B.weight base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight base_model.model.model.layers.14.self_attn.o_proj.lora_B.weight base_model.model.model.layers.14.mlp.gate_proj.lora_A.weight base_model.model.model.layers.14.mlp.gate_proj.lora_B.weight base_model.model.model.layers.14.mlp.down_proj.lora_A.weight base_model.model.model.layers.14.mlp.down_proj.lora_B.weight base_model.model.model.layers.14.mlp.up_proj.lora_A.weight base_model.model.model.layers.14.mlp.up_proj.lora_B.weight base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight base_model.model.model.layers.15.self_attn.q_proj.lora_B.weight base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight base_model.model.model.layers.15.self_attn.k_proj.lora_B.weight base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight base_model.model.model.layers.15.self_attn.v_proj.lora_B.weight base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight base_model.model.model.layers.15.self_attn.o_proj.lora_B.weight base_model.model.model.layers.15.mlp.gate_proj.lora_A.weight base_model.model.model.layers.15.mlp.gate_proj.lora_B.weight base_model.model.model.layers.15.mlp.down_proj.lora_A.weight base_model.model.model.layers.15.mlp.down_proj.lora_B.weight base_model.model.model.layers.15.mlp.up_proj.lora_A.weight base_model.model.model.layers.15.mlp.up_proj.lora_B.weight base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight base_model.model.model.layers.16.self_attn.q_proj.lora_B.weight base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight base_model.model.model.layers.16.self_attn.k_proj.lora_B.weight base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight base_model.model.model.layers.16.self_attn.v_proj.lora_B.weight base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight base_model.model.model.layers.16.self_attn.o_proj.lora_B.weight base_model.model.model.layers.16.mlp.gate_proj.lora_A.weight base_model.model.model.layers.16.mlp.gate_proj.lora_B.weight base_model.model.model.layers.16.mlp.down_proj.lora_A.weight base_model.model.model.layers.16.mlp.down_proj.lora_B.weight base_model.model.model.layers.16.mlp.up_proj.lora_A.weight base_model.model.model.layers.16.mlp.up_proj.lora_B.weight base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight base_model.model.model.layers.17.self_attn.q_proj.lora_B.weight base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight base_model.model.model.layers.17.self_attn.k_proj.lora_B.weight base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight base_model.model.model.layers.17.self_attn.v_proj.lora_B.weight base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight base_model.model.model.layers.17.self_attn.o_proj.lora_B.weight base_model.model.model.layers.17.mlp.gate_proj.lora_A.weight base_model.model.model.layers.17.mlp.gate_proj.lora_B.weight base_model.model.model.layers.17.mlp.down_proj.lora_A.weight base_model.model.model.layers.17.mlp.down_proj.lora_B.weight base_model.model.model.layers.17.mlp.up_proj.lora_A.weight base_model.model.model.layers.17.mlp.up_proj.lora_B.weight base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight base_model.model.model.layers.18.self_attn.q_proj.lora_B.weight base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight base_model.model.model.layers.18.self_attn.k_proj.lora_B.weight base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight base_model.model.model.layers.18.self_attn.v_proj.lora_B.weight base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight base_model.model.model.layers.18.self_attn.o_proj.lora_B.weight base_model.model.model.layers.18.mlp.gate_proj.lora_A.weight base_model.model.model.layers.18.mlp.gate_proj.lora_B.weight base_model.model.model.layers.18.mlp.down_proj.lora_A.weight base_model.model.model.layers.18.mlp.down_proj.lora_B.weight base_model.model.model.layers.18.mlp.up_proj.lora_A.weight base_model.model.model.layers.18.mlp.up_proj.lora_B.weight base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight base_model.model.model.layers.19.self_attn.q_proj.lora_B.weight base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight base_model.model.model.layers.19.self_attn.k_proj.lora_B.weight base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight base_model.model.model.layers.19.self_attn.v_proj.lora_B.weight base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight base_model.model.model.layers.19.self_attn.o_proj.lora_B.weight base_model.model.model.layers.19.mlp.gate_proj.lora_A.weight base_model.model.model.layers.19.mlp.gate_proj.lora_B.weight base_model.model.model.layers.19.mlp.down_proj.lora_A.weight base_model.model.model.layers.19.mlp.down_proj.lora_B.weight base_model.model.model.layers.19.mlp.up_proj.lora_A.weight base_model.model.model.layers.19.mlp.up_proj.lora_B.weight base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight base_model.model.model.layers.20.self_attn.q_proj.lora_B.weight base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight base_model.model.model.layers.20.self_attn.k_proj.lora_B.weight base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight base_model.model.model.layers.20.self_attn.v_proj.lora_B.weight base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight base_model.model.model.layers.20.self_attn.o_proj.lora_B.weight base_model.model.model.layers.20.mlp.gate_proj.lora_A.weight base_model.model.model.layers.20.mlp.gate_proj.lora_B.weight base_model.model.model.layers.20.mlp.down_proj.lora_A.weight base_model.model.model.layers.20.mlp.down_proj.lora_B.weight base_model.model.model.layers.20.mlp.up_proj.lora_A.weight base_model.model.model.layers.20.mlp.up_proj.lora_B.weight base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight base_model.model.model.layers.21.self_attn.q_proj.lora_B.weight base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight base_model.model.model.layers.21.self_attn.k_proj.lora_B.weight base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight base_model.model.model.layers.21.self_attn.v_proj.lora_B.weight base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight base_model.model.model.layers.21.self_attn.o_proj.lora_B.weight base_model.model.model.layers.21.mlp.gate_proj.lora_A.weight base_model.model.model.layers.21.mlp.gate_proj.lora_B.weight base_model.model.model.layers.21.mlp.down_proj.lora_A.weight base_model.model.model.layers.21.mlp.down_proj.lora_B.weight base_model.model.model.layers.21.mlp.up_proj.lora_A.weight base_model.model.model.layers.21.mlp.up_proj.lora_B.weight base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight base_model.model.model.layers.22.self_attn.q_proj.lora_B.weight base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight base_model.model.model.layers.22.self_attn.k_proj.lora_B.weight base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight base_model.model.model.layers.22.self_attn.v_proj.lora_B.weight base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight base_model.model.model.layers.22.self_attn.o_proj.lora_B.weight base_model.model.model.layers.22.mlp.gate_proj.lora_A.weight base_model.model.model.layers.22.mlp.gate_proj.lora_B.weight base_model.model.model.layers.22.mlp.down_proj.lora_A.weight base_model.model.model.layers.22.mlp.down_proj.lora_B.weight base_model.model.model.layers.22.mlp.up_proj.lora_A.weight base_model.model.model.layers.22.mlp.up_proj.lora_B.weight base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight base_model.model.model.layers.23.self_attn.q_proj.lora_B.weight base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight base_model.model.model.layers.23.self_attn.k_proj.lora_B.weight base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight base_model.model.model.layers.23.self_attn.v_proj.lora_B.weight base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight base_model.model.model.layers.23.self_attn.o_proj.lora_B.weight base_model.model.model.layers.23.mlp.gate_proj.lora_A.weight base_model.model.model.layers.23.mlp.gate_proj.lora_B.weight base_model.model.model.layers.23.mlp.down_proj.lora_A.weight base_model.model.model.layers.23.mlp.down_proj.lora_B.weight base_model.model.model.layers.23.mlp.up_proj.lora_A.weight base_model.model.model.layers.23.mlp.up_proj.lora_B.weight base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight base_model.model.model.layers.24.self_attn.q_proj.lora_B.weight base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight base_model.model.model.layers.24.self_attn.k_proj.lora_B.weight base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight base_model.model.model.layers.24.self_attn.v_proj.lora_B.weight base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight base_model.model.model.layers.24.self_attn.o_proj.lora_B.weight base_model.model.model.layers.24.mlp.gate_proj.lora_A.weight base_model.model.model.layers.24.mlp.gate_proj.lora_B.weight base_model.model.model.layers.24.mlp.down_proj.lora_A.weight base_model.model.model.layers.24.mlp.down_proj.lora_B.weight base_model.model.model.layers.24.mlp.up_proj.lora_A.weight base_model.model.model.layers.24.mlp.up_proj.lora_B.weight base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight base_model.model.model.layers.25.self_attn.q_proj.lora_B.weight base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight base_model.model.model.layers.25.self_attn.k_proj.lora_B.weight base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight base_model.model.model.layers.25.self_attn.v_proj.lora_B.weight base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight base_model.model.model.layers.25.self_attn.o_proj.lora_B.weight base_model.model.model.layers.25.mlp.gate_proj.lora_A.weight base_model.model.model.layers.25.mlp.gate_proj.lora_B.weight base_model.model.model.layers.25.mlp.down_proj.lora_A.weight base_model.model.model.layers.25.mlp.down_proj.lora_B.weight base_model.model.model.layers.25.mlp.up_proj.lora_A.weight base_model.model.model.layers.25.mlp.up_proj.lora_B.weight base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight base_model.model.model.layers.26.self_attn.q_proj.lora_B.weight base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight base_model.model.model.layers.26.self_attn.k_proj.lora_B.weight base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight base_model.model.model.layers.26.self_attn.v_proj.lora_B.weight base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight base_model.model.model.layers.26.self_attn.o_proj.lora_B.weight base_model.model.model.layers.26.mlp.gate_proj.lora_A.weight base_model.model.model.layers.26.mlp.gate_proj.lora_B.weight base_model.model.model.layers.26.mlp.down_proj.lora_A.weight base_model.model.model.layers.26.mlp.down_proj.lora_B.weight base_model.model.model.layers.26.mlp.up_proj.lora_A.weight base_model.model.model.layers.26.mlp.up_proj.lora_B.weight base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight base_model.model.model.layers.27.self_attn.q_proj.lora_B.weight base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight base_model.model.model.layers.27.self_attn.k_proj.lora_B.weight base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight base_model.model.model.layers.27.self_attn.v_proj.lora_B.weight base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight base_model.model.model.layers.27.self_attn.o_proj.lora_B.weight base_model.model.model.layers.27.mlp.gate_proj.lora_A.weight base_model.model.model.layers.27.mlp.gate_proj.lora_B.weight base_model.model.model.layers.27.mlp.down_proj.lora_A.weight base_model.model.model.layers.27.mlp.down_proj.lora_B.weight base_model.model.model.layers.27.mlp.up_proj.lora_A.weight base_model.model.model.layers.27.mlp.up_proj.lora_B.weight base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight base_model.model.model.layers.28.self_attn.q_proj.lora_B.weight base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight base_model.model.model.layers.28.self_attn.k_proj.lora_B.weight base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight base_model.model.model.layers.28.self_attn.v_proj.lora_B.weight base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight base_model.model.model.layers.28.self_attn.o_proj.lora_B.weight base_model.model.model.layers.28.mlp.gate_proj.lora_A.weight base_model.model.model.layers.28.mlp.gate_proj.lora_B.weight base_model.model.model.layers.28.mlp.down_proj.lora_A.weight base_model.model.model.layers.28.mlp.down_proj.lora_B.weight base_model.model.model.layers.28.mlp.up_proj.lora_A.weight base_model.model.model.layers.28.mlp.up_proj.lora_B.weight base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight base_model.model.model.layers.29.self_attn.q_proj.lora_B.weight base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight base_model.model.model.layers.29.self_attn.k_proj.lora_B.weight base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight base_model.model.model.layers.29.self_attn.v_proj.lora_B.weight base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight base_model.model.model.layers.29.self_attn.o_proj.lora_B.weight base_model.model.model.layers.29.mlp.gate_proj.lora_A.weight base_model.model.model.layers.29.mlp.gate_proj.lora_B.weight base_model.model.model.layers.29.mlp.down_proj.lora_A.weight base_model.model.model.layers.29.mlp.down_proj.lora_B.weight base_model.model.model.layers.29.mlp.up_proj.lora_A.weight base_model.model.model.layers.29.mlp.up_proj.lora_B.weight base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight base_model.model.model.layers.30.self_attn.q_proj.lora_B.weight base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight base_model.model.model.layers.30.self_attn.k_proj.lora_B.weight base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight base_model.model.model.layers.30.self_attn.v_proj.lora_B.weight base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight base_model.model.model.layers.30.self_attn.o_proj.lora_B.weight base_model.model.model.layers.30.mlp.gate_proj.lora_A.weight base_model.model.model.layers.30.mlp.gate_proj.lora_B.weight base_model.model.model.layers.30.mlp.down_proj.lora_A.weight base_model.model.model.layers.30.mlp.down_proj.lora_B.weight base_model.model.model.layers.30.mlp.up_proj.lora_A.weight base_model.model.model.layers.30.mlp.up_proj.lora_B.weight base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight base_model.model.model.layers.31.self_attn.q_proj.lora_B.weight base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight base_model.model.model.layers.31.self_attn.k_proj.lora_B.weight base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight base_model.model.model.layers.31.self_attn.v_proj.lora_B.weight base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight base_model.model.model.layers.31.self_attn.o_proj.lora_B.weight base_model.model.model.layers.31.mlp.gate_proj.lora_A.weight base_model.model.model.layers.31.mlp.gate_proj.lora_B.weight base_model.model.model.layers.31.mlp.down_proj.lora_A.weight base_model.model.model.layers.31.mlp.down_proj.lora_B.weight base_model.model.model.layers.31.mlp.up_proj.lora_A.weight base_model.model.model.layers.31.mlp.up_proj.lora_B.weight base_model.model.lm_head.weight

longeryeah avatar May 11 '23 14:05 longeryeah

可训练参数都正常。但你的模型总参数量和我测出来的不一致,llama-7b我这边是大概是6.9B,而你那显示的是6.6B

airaria avatar May 11 '23 14:05 airaria

非常感谢 1.我简单算了下我的vocab size是32000,如果训练参数量加上0.3B的话,训练的参数百分比和6.06%是差不多的。看着应该是embed_tokens和lm_head参数的问题。 2.我还想问下这个Training Steps有什么技巧吗,还是观察loss所得到的

longeryeah avatar May 12 '23 01:05 longeryeah

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.

github-actions[bot] avatar May 19 '23 22:05 github-actions[bot]

Closing the issue, since no updates observed. Feel free to re-open if you need any further assistance.

github-actions[bot] avatar May 23 '23 22:05 github-actions[bot]