LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

Bloom跑pt的full时,跑到60steps后,loss趋近于0

Open wqc007 opened this issue 6 months ago • 1 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

nohup deepspeed --num_gpus 8 --master_port=5544 src/train_bash.py
--deepspeed ds_config.json
--stage pt
--do_train
--model_name_or_path bigscience/bloom-7b1
--dataset wiki_demo
--finetuning_type full
--output_dir path_to_pt_checkpoint
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 1
--save_steps 200
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--max_steps 200
--fp16

Expected behavior

Bloom跑pt的full时, loss像其他stage一样,loss正常下降!

System Info

No response

Others

No response

wqc007 avatar Dec 25 '23 07:12 wqc007