ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[BUG/Help] <title> 断点续更

Open Chevalier1024 opened this issue 2 years ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

06/20/2023 22:24:19 - WARNING - trainer - There were missing keys in the checkpoint model loaded: ['transformer.word_embeddings.weight', 'transformer.layers.0.input_layernorm.weight ', 'transformer.layers.0.input_layernorm.bias', 'transformer.layers.0.attention.query_key_value.bias', 'transformer.layers.0.attention.query_key_value.weight', 'transformer.layers.0 .attention.query_key_value.weight_scale', 'transformer.layers.0.attention.dense.bias', 'transformer.layers.0.attention.dense.weight', 'transformer.layers.0.attention.dense.weight_sc ale', 'transformer.layers.0.post_attention_layernorm.weight', 'transformer.layers.0.post_attention_layernorm.bias', 'transformer.layers.0.mlp.dense_h_to_4h.bias', 'transformer.layer s.0.mlp.dense_h_to_4h.weight', 'transformer.layers.0.mlp.dense_h_to_4h.weight_scale', 'transformer.layers.0.mlp.dense_4h_to_h.bias', 'transformer.layers.0.mlp.dense_4h_to_h.weight', 'transformer.layers.0.mlp.dense_4h_to_h.weight_scale', 'transformer.layers.1.input_layernorm.weight', 'transformer.layers.1.input_layernorm.bias', 'transformer.layers.1.attention.q uery_key_value.bias', 'transformer.layers.1.attention.query_key_value.weight', 'transformer.layers.1.attention.query_key_value.weight_scale', 'transformer.layers.1.attention.dense.b ias', 'transformer.layers.1.attention.dense.weight', 'transformer.layers.1.attention.dense.weight_scale', 'transformer.layers.1.post_attention_layernorm.weight', 'transformer.layers .1.post_attention_layernorm.bias', 'transformer.layers.1.mlp.dense_h_to_4h.bias', 'transformer.layers.1.mlp.dense_h_to_4h.weight', 'transformer.layers.1.mlp.dense_h_to_4h.weight_sca le', 'transformer.layers.1.mlp.dense_4h_to_h.bias', 'transformer.layers.1.mlp.dense_4h_to_h.weight', 'transformer.layers.1.mlp.dense_4h_to_h.weight_scale', 'transformer.layers.2.inp ut_layernorm.weight', 'transformer.layers.2.input_layernorm.bias', 'transformer.layers.2.attention.query_key_value.bias', 'transformer.layers.2.attention.query_key_value.weight', 't ransformer.layers.2.attention.query_key_value.weight_scale', 'transformer.layers.2.attention.dense.bias', 'transformer.layers.2.attention.dense.weight', 'transformer.layers.2.attent ion.dense.weight_scale', 'transformer.layers.2.post_attention_layernorm.weight', 'transformer.layers.2.post_attention_layernorm.bias', 'transformer.layers.2.mlp.dense_h_to_4h.bias', 'transformer.layers.2.mlp.dense_h_to_4h.weight', 'transformer.layers.2.mlp.dense_h_to_4h.weight_scale', 'transformer.layers.2.mlp.dense_4h_to_h.bias', 'transformer.layers.2.mlp.den se_4h_to_h.weight', 'transformer.layers.2.mlp.dense_4h_to_h.weight_scale', 'transformer.layers.3.input_layernorm.weight', 'transformer.layers.3.input_layernorm.bias', 'transformer.l ayers.3.attention.query_key_value.bias', 'transformer.layers.3.attention.query_key_value.weight', 'transformer.layers.3.attention.query_key_value.weight_scale', 'transformer.layers. 3.attention.de

Expected Behavior

No response

Steps To Reproduce

PRE_SEQ_LEN=128 LR=2e-2

python3 -m torch.distributed.launch
--nproc_per_node=2
main.py
--do_train
--do_eval
--train_file data/train3.json
--validation_file data/dev3.json
--prompt_column content
--response_column summary
--overwrite_cache
--model_name_or_path /dockerdata/peakchen/models--THUDM--chatglm-6b/snapshots/1d240ba371910e9282298d4592532d7f0f3e9f3e
--output_dir output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR-v5
--ptuning_checkpoint output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR-v5/checkpoint-6000
--resume_from_checkpoint output/adgen-chatglm-6b-pt-$PRE_SEQ_LEN-$LR-v5/checkpoint-6000
--max_source_length 512
--max_target_length 512
--val_max_target_length 512
--per_device_train_batch_size 8
--per_device_eval_batch_size 16
--gradient_accumulation_steps 2
--predict_with_generate
--max_steps 9000
--logging_steps 10
--save_steps 500
--learning_rate $LR
--pre_seq_len $PRE_SEQ_LEN
--quantization_bit 4

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

Chevalier1024 avatar Jun 21 '23 07:06 Chevalier1024

您好,我也遇到了这个问题,请问您解决了吗?

mockyd avatar Oct 15 '23 20:10 mockyd

hello,I have also encountered this problem. Have you solved this problem?

NicholasEinstein avatar Mar 02 '24 13:03 NicholasEinstein