LLaMA-Factory 拜托大佬们帮我看下，用6张A800-80G预训练百川7B内存消耗问题

拜托大佬们帮我看下，用6张A800-80G预训练百川7B内存消耗问题

Open Smilefish1 opened this issue 1 year ago • 5 comments

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

deepspeed --num_gpus 6 --master_port=9901 src/train_bash.py
--deepspeed ds_config.json
--train_id testetstets
--stage pt
--do_train
--model_name_or_path llm_model/Baichuan-7B
--dataset /pretrain_data
--finetuning_type full
--output_dir /test_pt/
--overwrite_cache
--per_device_train_batch_size 2
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 5e-5
--num_train_epochs 2.0
--plot_loss
--fp16

ds配置： { "train_batch_size": "auto", "train_micro_batch_size_per_gpu": "auto", "gradient_accumulation_steps": "auto", "gradient_clipping": "auto", "zero_allow_untested_optimizer": true, "fp16": { "enabled": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 5e8, "reduce_scatter": true, "reduce_bucket_size": 5e8, "overlap_comm": false, "contiguous_gradients": true } }

Expected behavior

我看项目介绍硬件资源说，训练数据就是项目里面的wiki_demo.txt，全参训练混合精度的7B模型，需要160G，但是我实际训练发现远远不止，有大佬知道这是什么原因造成的吗。 1704952751679

System Info

cuda:12.0 transformer：4.34.0 其他的就是按照requirements.txt安装的

Others

No response

Jan 11 '24 06:01 Smilefish1

和batchsize，还有cutoff length等等有关

Jan 11 '24 07:01 linchen111

和batchsize，还有cutoff length等等有关

我现在的batch-size设置的是1哎，累积步数也是1，cutoff 设置时默认的1024，那我把cutoff设置成256试试

Jan 11 '24 07:01 Smilefish1

和batchsize，还有cutoff length等等有关

刚试了把cutoff lenth从1024改成了512，预训练占用的现存没变化 1704959630013

Jan 11 '24 07:01 Smilefish1

开zero3

Jan 11 '24 10:01 xll2001

开zero3

好的好的，我试下呀，大佬

Jan 11 '24 10:01 Smilefish1

LLaMA-Factory LLaMA-Factory copied to clipboard

拜托大佬们帮我看下，用6张A800-80G预训练百川7B内存消耗问题

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard