Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/d2022/qs/wtz/ChatGLM-6B/ptuning/main.py:430 in │
│ │
│ 427 │
│ 428 │
│ 429 if name == "main": │
│ ❱ 430 │ main() │
│ 431 │
│ │
│ /data/d2022/qs/wtz/ChatGLM-6B/ptuning/main.py:369 in main │
│ │
│ 366 │ │ # checkpoint = last_checkpoint │
│ 367 │ │ model.gradient_checkpointing_enable() │
│ 368 │ │ model.enable_input_require_grads() │
│ ❱ 369 │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint) │
│ 370 │ │ # trainer.save_model() # Saves the tokenizer too for easy upload │
│ 371 │ │ │
│ 372 │ │ metrics = train_result.metrics │
│ │
│ /data/d2022/qs/wtz/ChatGLM-6B/ptuning/trainer.py:1635 in train │
│ │
│ 1632 │ │ inner_training_loop = find_executable_batch_size( │
│ 1633 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1634 │ │ ) │
│ ❱ 1635 │ │ return inner_training_loop( │
│ 1636 │ │ │ args=args, │
│ 1637 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1638 │ │ │ trial=trial, │
│ │
│ /data/d2022/qs/wtz/ChatGLM-6B/ptuning/trainer.py:1704 in _inner_training_loop │
│ │
│ 1701 │ │ │ or self.fsdp is not None │
│ 1702 │ │ ) │
│ 1703 │ │ if args.deepspeed: │
│ ❱ 1704 │ │ │ deepspeed_engine, optimizer, lr_scheduler = deepspeed_init( │
│ 1705 │ │ │ │ self, num_training_steps=max_steps, resume_from_checkpoint=resume_from_c │
│ 1706 │ │ │ ) │
│ 1707 │ │ │ self.model = deepspeed_engine.module │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: deepspeed_init() got an unexpected keyword argument 'resume_from_checkpoint'
Running tokenizer on train dataset: 4%|████▎ | 5/115 [00:04<01:34, 1.16ba/s][2023-07-18 09:39:05,315] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 59378
[2023-07-18 09:39:05,315] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 59379
[2023-07-18 09:39:07,453] [ERROR] [launch.py:324:sigkill_handler] ['/home/qs/anaconda3/bin/python', '-u', 'main.py', '--local_rank=1', '--deepspeed', 'deepspeed.json', '--do_train', '--train_file', 'AdvertiseGen/train.json', '--test_file', 'AdvertiseGen/dev.json', '--prompt_column', 'content', '--response_column', 'summary', '--overwrite_cache', '--model_name_or_path', 'THUDM/chatglm-6b', '--output_dir', './output/adgen-chatglm-6b-ft-1e-4', '--overwrite_output_dir', '--max_source_length', '64', '--max_target_length', '64', '--per_device_train_batch_size', '4', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--predict_with_generate', '--max_steps', '5000', '--logging_steps', '10', '--save_steps', '1000', '--learning_rate', '1e-4', '--fp16'] exits with return code = 1
Expected Behavior
No response
Steps To Reproduce
pip install deepspeed
bash ds_train_finetune.sh
Environment
- OS:ubuntu 18.04
- Python:3.9.13
- Transformers:4.27.1
- PyTorch:11.7
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :11.7
Anything else?
No response
Jul 18
'23 01:07
w-tz