llm-foundry
llm-foundry copied to clipboard
Slow on V100
Hi teams, I'm fine-tuning with 6 V100 GPUs. The fine-tuning process is extremely slow for me. I'm using fp16 and attn_impl: torch, with a global_train_batch_size of 12 and device_train_microbatch_size automatically set to 2 (device_train_microbatch_size: auto). Even after 15 hours, I haven't finished training one-third of an epoch (500k rows of data). Did I miss anything?