Significant Inference Time Discrepancy Between craft_mlt_25k.pth and CRAFT_clr_amp_29500.pth on Various CPUs

Open dubeamit opened this issue 2 years ago • 1 comments

Hey, I've been using EasyOCR for text detection and have come across a significant performance discrepancy between two provided models: craft_mlt_25k.pth (original CRAFT model) and CRAFT_clr_amp_29500.pth (EasyOCR's version). This discrepancy is particularly evident in CPU-based inference times, and I believe it warrants further investigation or clarification.

Issue Description

I have conducted inference time tests on different machines, all utilizing CPU only (no CUDA). Below is a comparison table of the inference times for both models across various CPU architectures:

CPU Specs	`craft_mlt_25k.pth`	`CRAFT_clr_amp_29500.pth`
Intel® Core™ i7-6820HQ @2.70GHz	2 sec	18 sec
Intel® Core™ i7-9850HQ @2.60GHz	1.24 sec	10.35 sec
Intel® Core™ i7-10870H @2.20GHz	0.86 sec	8.10 sec
Intel® Xeon® CPU @ 2.20GHz	6.35 sec	68.59 sec
AMD Ryzen 7 6800H @4.7GHz	0.72 sec	0.66 sec

Observations

The CRAFT_clr_amp_29500.pth model shows significantly longer inference times on Intel CPUs compared to the original craft_mlt_25k.pth.
On AMD Ryzen 7 6800H, the performance difference is negligible.
This performance variance is consistent and significant, especially on Intel CPUs.

Queries and Request for Insight

Is there a specific reason for this substantial difference in performance?
Are there any optimizations or settings that are more suited for Intel CPUs in the CRAFT_clr_amp_29500.pth model?
Is this an expected behavior, or could it indicate a potential issue in the model optimization for different CPU architectures?

Thank you for your time and the fantastic work on EasyOCR.

Dec 04 '23 10:12 dubeamit

I have the same issue on an Intel Core i7-10850H. Own model is 10x slower.

Quantification to torch.qint8 (as in the original get_detector() function) does not help.

Mar 26 '25 22:03 TristanBandat