LLaMA-Factory Slow batched evals

Slow batched evals

Open shreyaspimpalgaonkar opened this issue 7 months ago • 2 comments

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory-version 0.8.3.dev0 python 3.11.9 AWS EC2 instance

Reproduction

llamafactory-cli train \
      --stage sft \
      --model_name_or_path /home/ft/ \ # phi3-3.8b model
      --preprocessing_num_workers 16 \
      --finetuning_type full \
      --template phi \
      --flash_attn fa2 \
      --dataset_dir data \
      --dataset triples_new_ds \
      --cutoff_len 4096 \
      --max_samples 500 \
      --per_device_eval_batch_size 8 \
      --predict_with_generate True \
      --max_new_tokens 1024 \
      --top_p 1 \
      --temperature 1 \
      --output_dir <out_dir> \
      --do_predict True \
      --quantization_method bitsandbytes \
      --seed $seed

Expected behavior

Hi,

The above script does batched eval on 500 examples on an A100 node with batch size 8, and takes an hour to run. This is significantly slower than running evals with batch size 1, which runs in around 15 minutes. Do you know why this might be happening (maybe the longest generation in each batch is the bottleneck)? And, is there a way to make the batched evals faster? The model is small, so I want to have some parallelization to use the full available GPU resources. Thanks so much!

Others

Thanks again for the great work!

Jul 12 '24 21:07 shreyaspimpalgaonkar

LLaMA-Factory LLaMA-Factory copied to clipboard

Slow batched evals

Reminder

System Info

Reproduction

Expected behavior

Others

LLaMA-Factory
LLaMA-Factory copied to clipboard