LLaMA-Factory
LLaMA-Factory copied to clipboard
Bloom跑pt的full时,跑到60steps后,loss趋近于0
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
nohup deepspeed --num_gpus 8 --master_port=5544 src/train_bash.py
--deepspeed ds_config.json
--stage pt
--do_train
--model_name_or_path bigscience/bloom-7b1
--dataset wiki_demo
--finetuning_type full
--output_dir path_to_pt_checkpoint
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 1
--save_steps 200
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--max_steps 200
--fp16
Expected behavior
Bloom跑pt的full时, loss像其他stage一样,loss正常下降!
System Info
No response
Others
No response