ChatGLM2-6B icon indicating copy to clipboard operation
ChatGLM2-6B copied to clipboard

[BUG/Help] 运行bash ds_train_finetune.sh报错

Open songmzhang opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

直接运行 bash ds_train_finetune.sh 会报以下错误: Traceback (most recent call last): File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/main.py", line 411, in main() File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/main.py", line 350, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/trainer.py", line 1635, in train return inner_training_loop( File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/trainer.py", line 1704, in _inner_training_loop deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( TypeError: deepspeed_init() got an unexpected keyword argument 'resume_from_checkpoint'

即便在 trainer.py 中去掉 'resume_from_checkpoint' 这个参数,再执行又会遇到另一个错误: Traceback (most recent call last): File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/main.py", line 411, in main() File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/main.py", line 350, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/trainer.py", line 1635, in train return inner_training_loop( File "/data/zhangsm/chatglm/ChatGLM2-6B/ptuning/trainer.py", line 1704, in _inner_training_loop deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( File "/data/zhangsm/anaconda3/envs/chatglm/lib/python3.10/site-packages/transformers/deepspeed.py", line 340, in deepspeed_init hf_deepspeed_config = trainer.accelerator.state.deepspeed_plugin.hf_ds_config AttributeError: 'Seq2SeqTrainer' object has no attribute 'accelerator'

Expected Behavior

No response

Steps To Reproduce

  1. replace "THUDM/chatglm2-6b" with the local path "/data/zhangsm/chatglm/chatglm2-6b"
  2. bash ds_train_finetune.sh

Environment

- Python: 3.10
- Transformers: 4.30.2
- PyTorch: 1.13
- CUDA Support: True

Anything else?

No response

songmzhang avatar Jul 04 '23 09:07 songmzhang

我遇到了同样的问题, 似乎是trainer不兼容新版的transformers

BobZhang321 avatar Jul 04 '23 09:07 BobZhang321

transformers 4.29.2版本有 resume_from_checkpoint 参数。

abandonever avatar Jul 04 '23 09:07 abandonever

单 GPU 能运行 ds_train_finetune.sh 吗? 需要多少显存?

abandonever avatar Jul 04 '23 09:07 abandonever

友情链接:复现清华chatGLM2-6B模型指令微调训练模型(2023-07-04 18:00)开源: https://github.com/THUDM/ChatGLM2-6B 可以支持GPU CPU训练、支持chatGLM-6B微调训练数据格式,和alpaca指令微调训练数据格式。 但CPU训练我用了fp32未量化微调训练,很是耗费内存。推荐大家使用GPU训练。 相关文档我会加紧编辑上传,力挺清华GLM社区!

lilongxian avatar Jul 04 '23 10:07 lilongxian

版本问题

Siegfried-qgf avatar Jul 05 '23 08:07 Siegfried-qgf

可以 pull 一下最新的仓库代码

duzx16 avatar Jul 05 '23 13:07 duzx16