Significant Inference Time Discrepancy Between craft_mlt_25k.pth and CRAFT_clr_amp_29500.pth on Various CPUs
Hey,
I've been using EasyOCR for text detection and have come across a significant performance discrepancy between two provided models: craft_mlt_25k.pth (original CRAFT model) and CRAFT_clr_amp_29500.pth (EasyOCR's version). This discrepancy is particularly evident in CPU-based inference times, and I believe it warrants further investigation or clarification.
Issue Description
I have conducted inference time tests on different machines, all utilizing CPU only (no CUDA). Below is a comparison table of the inference times for both models across various CPU architectures:
| CPU Specs | craft_mlt_25k.pth |
CRAFT_clr_amp_29500.pth |
|---|---|---|
| Intel® Core™ i7-6820HQ @2.70GHz | 2 sec | 18 sec |
| Intel® Core™ i7-9850HQ @2.60GHz | 1.24 sec | 10.35 sec |
| Intel® Core™ i7-10870H @2.20GHz | 0.86 sec | 8.10 sec |
| Intel® Xeon® CPU @ 2.20GHz | 6.35 sec | 68.59 sec |
| AMD Ryzen 7 6800H @4.7GHz | 0.72 sec | 0.66 sec |
Observations
- The
CRAFT_clr_amp_29500.pthmodel shows significantly longer inference times on Intel CPUs compared to the originalcraft_mlt_25k.pth. - On AMD Ryzen 7 6800H, the performance difference is negligible.
- This performance variance is consistent and significant, especially on Intel CPUs.
Queries and Request for Insight
- Is there a specific reason for this substantial difference in performance?
-
Are there any optimizations or settings that are more suited for Intel CPUs in the
CRAFT_clr_amp_29500.pthmodel? - Is this an expected behavior, or could it indicate a potential issue in the model optimization for different CPU architectures?
Thank you for your time and the fantastic work on EasyOCR.
I have the same issue on an Intel Core i7-10850H. Own model is 10x slower.
Quantification to torch.qint8 (as in the original get_detector() function) does not help.