deep-learning-pytorch-huggingface Inference on CNN validation set takes 2+ hours on p4dn.24xlarge machine with 8 A100s, 40GB each

I tried to run the code as it is for training and at the end of each epoch it does inference on test set, I found that it was taking too long for inference and the GPU memory and utilization was getting maxed out on p4dn.24x that has 8 A100s, 40GB. Surprisingly the training was much faster than inference! Any idea how to fix this? Thanks!

Apr 08 '23 22:04 sverneka

@stas00 Can you please help me with this? Thanks!

Apr 09 '23 16:04 sverneka

I'm not quite sure why you're tagging me here as I am not part of this project and I have no idea what code you're talking about.

If it's a transformers question please ask https://github.com/huggingface/transformers/issues and give full context of the issue.

Thank you.

Apr 10 '23 17:04 stas00

The same issue occurs while fine-tuning Flan-T5 with LoRA and bnb int-8 on a summarisation dataset using 1 A100 40G. It takes a long time for inference while the training is very fast. Any solution? Thank you!

Apr 13 '23 06:04 Ro0tee

This doesn't seem like an issue to me. Have you tried running inference after the training is done? and adjusted the parameters?

Apr 13 '23 06:04 philschmid

having seen the same problem and I got warning messages like:
Invalidate trace cache @ step 0: expected module 2, but got module 0

Apr 25 '23 22:04 sadahanu