LLaMA-Factory Bloom跑pt的full时，跑到60steps后，loss趋近于0

Bloom跑pt的full时，跑到60steps后，loss趋近于0

Open wqc007 opened this issue 6 months ago • 1 comments

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

nohup deepspeed --num_gpus 8 --master_port=5544 src/train_bash.py
--deepspeed ds_config.json
--stage pt
--do_train
--model_name_or_path bigscience/bloom-7b1
--dataset wiki_demo
--finetuning_type full
--output_dir path_to_pt_checkpoint
--overwrite_cache
--per_device_train_batch_size 4
--gradient_accumulation_steps 4
--lr_scheduler_type cosine
--logging_steps 1
--save_steps 200
--learning_rate 5e-5
--num_train_epochs 3.0
--plot_loss
--max_steps 200
--fp16

Expected behavior

Bloom跑pt的full时， loss像其他stage一样，loss正常下降！

System Info

No response

Others

No response

Dec 25 '23 07:12 wqc007

LLaMA-Factory LLaMA-Factory copied to clipboard

Bloom跑pt的full时，跑到60steps后，loss趋近于0

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard