donut icon indicating copy to clipboard operation
donut copied to clipboard

inference speed on cpu is very slow as compared to gpu inference

Open vishal-nayak1 opened this issue 2 years ago • 2 comments

@gwkrsrch I have tried to run the inference script on cpu, the cpu inference time is very high as compared to gpu inference time.Can you fix this issue?

vishal-nayak1 avatar Oct 04 '22 12:10 vishal-nayak1

Is there any solution to this ? @gwkrsrch thank you

trikiamine23 avatar Oct 10 '22 09:10 trikiamine23

Hi guys, I've been playing around with the library. Looks quite interesting. great work @gwkrsrch Just some notes here on the time taken ( please note here, I've used a Linux Dell Inspiron 15 model with 8 GB RAM ) So please take the results with a pinch of salt as my machine has the most basic configs and is not GPU enabled. used latest Mint Linux version (21.0)

  1. Without GPU and CUDA installed, it took approx. 4 hours to complete.
  2. Installed CUDA and same tests completed within 1 hour. So I wanted to ask what is the minimum required in terms of GPU and RAM on my laptop to be able to train with a decent number of images (500 - 1000) ? thanks.

ghost avatar Oct 16 '22 11:10 ghost

Hi @dneemuth @trikiamine23 @vishal-nayak1 ,

I recently updated some lines to make the CPU inference fast. It seems torch.bfloat16-related lines were the source of the issue. Please use the latest version. Hope this update helps :)

Feel free to reopen this or open another issue if you have anything new for sharing/debugging.

gwkrsrch avatar Nov 16 '22 13:11 gwkrsrch

@NielsRogge Can you please update these changes to transformer library as well.

Thanks

vishal-nayak1 avatar Nov 21 '22 09:11 vishal-nayak1

Hi @gwkrsrch,

Could you clarify which changes are necessary for fast CPU inference, then I'll update the 🤗 model as well

NielsRogge avatar Nov 21 '22 09:11 NielsRogge

Hi @gwkrsrch thank you very much for the changes.

I have noticed that things did not change. Do you have any standard timings on CPU (before/after change) ?

For me dpi 300 and 2 fields to extract with 16CPUs = 10seconds

trikiamine23 avatar Dec 08 '22 10:12 trikiamine23

I have the same issue. An Image with size 1658x2343 has around 40 seconds to classify. Im running on 8 CPUs...

HolzmanoLagrene avatar Dec 09 '22 21:12 HolzmanoLagrene