LLaMA-Factory
LLaMA-Factory copied to clipboard
拜托大佬们帮我看下,用6张A800-80G预训练百川7B内存消耗问题
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
deepspeed --num_gpus 6 --master_port=9901 src/train_bash.py
--deepspeed ds_config.json
--train_id testetstets
--stage pt
--do_train
--model_name_or_path llm_model/Baichuan-7B
--dataset /pretrain_data
--finetuning_type full
--output_dir /test_pt/
--overwrite_cache
--per_device_train_batch_size 2
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 2.0
--plot_loss
--fp16
ds配置: { "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5e8, "reduce_scatter": true, "reduce_bucket_size": 5e8, "overlap_comm": false, "contiguous_gradients": true } }
Expected behavior
我看项目介绍硬件资源说,训练数据就是项目里面的wiki_demo.txt,全参训练混合精度的7B模型,需要160G,但是我实际训练发现远远不止,有大佬知道这是什么原因造成的吗。
System Info
cuda:12.0 transformer:4.34.0 其他的就是按照requirements.txt安装的
Others
No response
和batchsize,还有cutoff length等等有关
和batchsize,还有cutoff length等等有关
我现在的batch-size设置的是1哎,累积步数也是1,cutoff 设置时默认的1024,那我把cutoff设置成256试试
和batchsize,还有cutoff length等等有关
刚试了把cutoff lenth从1024改成了512,预训练占用的现存没变化
开zero3
开zero3
好的好的,我试下呀,大佬