LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

拜托大佬们帮我看下,用6张A800-80G预训练百川7B内存消耗问题

Open Smilefish1 opened this issue 1 year ago • 5 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

deepspeed --num_gpus 6 --master_port=9901 src/train_bash.py
--deepspeed ds_config.json
--train_id testetstets
--stage pt
--do_train
--model_name_or_path llm_model/Baichuan-7B
--dataset /pretrain_data
--finetuning_type full
--output_dir /test_pt/
--overwrite_cache
--per_device_train_batch_size 2
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 2.0
--plot_loss
--fp16

ds配置: { "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5e8, "reduce_scatter": true, "reduce_bucket_size": 5e8, "overlap_comm": false, "contiguous_gradients": true } }

Expected behavior

我看项目介绍硬件资源说,训练数据就是项目里面的wiki_demo.txt,全参训练混合精度的7B模型,需要160G,但是我实际训练发现远远不止,有大佬知道这是什么原因造成的吗。 1704952751679

System Info

cuda:12.0 transformer:4.34.0 其他的就是按照requirements.txt安装的

Others

No response

Smilefish1 avatar Jan 11 '24 06:01 Smilefish1

和batchsize,还有cutoff length等等有关

linchen111 avatar Jan 11 '24 07:01 linchen111

和batchsize,还有cutoff length等等有关

我现在的batch-size设置的是1哎,累积步数也是1,cutoff 设置时默认的1024,那我把cutoff设置成256试试

Smilefish1 avatar Jan 11 '24 07:01 Smilefish1

和batchsize,还有cutoff length等等有关

刚试了把cutoff lenth从1024改成了512,预训练占用的现存没变化 1704959630013

Smilefish1 avatar Jan 11 '24 07:01 Smilefish1

开zero3

xll2001 avatar Jan 11 '24 10:01 xll2001

开zero3

好的好的,我试下呀,大佬

Smilefish1 avatar Jan 11 '24 10:01 Smilefish1