LLaMA-Factory 预训练效率问题

预训练效率问题

Open 18140663659 opened this issue 10 months ago • 1 comments

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

accelerate launch src/train_bash.py
--stage pt
--model_name_or_path $model_name_or_path
--do_train
--dataset $dataset
--streaming
--max_steps 10000
--finetuning_type full
--output_dir $output_dir
--overwrite_cache
--per_device_train_batch_size 2
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 500
--save_total_limit 2
--learning_rate 5e-6
--num_train_epochs 1.0
--plot_loss
--use_fast_tokenizer false
--preprocessing_num_workers 64
--cutoff_len 2048
--bf16
--warmup_steps 10
--max_grad_norm 1.0 2>&1 | tee $output_dir/log.txt

Expected behavior

在7卡上运行以上预训练代码，10000steps训练大概要2天左右的时间，请问是否有提效的一些方式

System Info

torch==1.14.0a0+410ce96 uvicorn fastapi==0.95.1 sse-starlette tiktoken trl==0.7.4 peft>=0.4.0 accelerate>=0.21.0 jieba rouge-chinese gradio fsspec==2023.9.2 transformers==4.31.0 #deepspeed==0.9.1 deepspeed==0.9.3 nltk openpyxl

Others

无

Apr 10 '24 02:04 18140663659

不知道你是训练什么模型

在不爆显存的情况下，适当提高 per_device_train_batch_size
开启 flash attention 会快一点点

Apr 10 '24 03:04 codemayq

LLaMA-Factory LLaMA-Factory copied to clipboard

预训练效率问题

Reminder

Reproduction

Expected behavior

System Info

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard