LLaMA-Factory
LLaMA-Factory copied to clipboard
Slow batched evals
Reminder
- [X] I have read the README and searched the existing issues.
System Info
llamafactory-version 0.8.3.dev0 python 3.11.9 AWS EC2 instance
Reproduction
llamafactory-cli train \
--stage sft \
--model_name_or_path /home/ft/ \ # phi3-3.8b model
--preprocessing_num_workers 16 \
--finetuning_type full \
--template phi \
--flash_attn fa2 \
--dataset_dir data \
--dataset triples_new_ds \
--cutoff_len 4096 \
--max_samples 500 \
--per_device_eval_batch_size 8 \
--predict_with_generate True \
--max_new_tokens 1024 \
--top_p 1 \
--temperature 1 \
--output_dir <out_dir> \
--do_predict True \
--quantization_method bitsandbytes \
--seed $seed
Expected behavior
Hi,
The above script does batched eval on 500 examples on an A100 node with batch size 8, and takes an hour to run. This is significantly slower than running evals with batch size 1, which runs in around 15 minutes. Do you know why this might be happening (maybe the longest generation in each batch is the bottleneck)? And, is there a way to make the batched evals faster? The model is small, so I want to have some parallelization to use the full available GPU resources. Thanks so much!
Others
Thanks again for the great work!