flan-alpaca-lora Training script takes more than 2 hours to finish

Training script takes more than 2 hours to finish

Open surapant opened this issue 10 months ago • 2 comments

Hi. Thanks for your nice work!

I've tried to run your training script on a RTX3090 with exact dependencies as you suggested. It turned out that it took more than 2 hours to finish instead of 20 minutes. I also tried training flan-t5-large and it took more than 4 hours. What can be the reasons for this?

Aug 22 '23 04:08 surapant

It is hard to locate the problem without details of the machine runing the code. There might be several possible reasons: different datasets, different cuda version, cpu bottleneck, databus bottleneck, older GPU may suffer performance drop... However, if you can run the code smoothly, time may not be a problem since you just let it run.

Aug 22 '23 20:08 Reason-Wang

Thanks for your answer. I just thought it shouldn't be that different :)

Aug 24 '23 15:08 surapant

flan-alpaca-lora flan-alpaca-lora copied to clipboard

Training script takes more than 2 hours to finish

flan-alpaca-lora
flan-alpaca-lora copied to clipboard