PaddleOCR v3 with OpenVINO is much slower than PaddleOCR v2
🔎 Search before asking
- [x] I have searched the PaddleOCR Docs and found no similar bug report.
- [x] I have searched the PaddleOCR Issues and found no similar bug report.
- [x] I have searched the PaddleOCR Discussions and found no similar bug report.
🐛 Bug (问题描述)
Using the same models in PP-OCRv4, I see PaddleOCR v3 is much slower than PaddleOCR v2 with OpenVINO ( >x2 slower )
🏃♂️ Environment (运行环境)
OS: Ubuntu 22.04.5 LTS
Environment: Jupyter
Python: 3.11.5
PaddleOCR: 3.2.0 and 2.7.0.3
Install: pip
RAM: 24G
CPU: AMD Ryzen 5 PRO 4650U with Radeon Graphics
CUDA None
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
In the latest version, I use
from paddlex.inference import load_pipeline_config
# Configure PaddleX OCR pipeline to force HPI -> OpenVINO for detector and recognizer
cfg_v3_openvino = load_pipeline_config("OCR")
cfg_v3_openvino["use_doc_preprocessor"] = False
cfg_v3_openvino["use_textline_orientation"] = False
for sub_name in ("TextDetection", "TextRecognition"):
sub_cfg = cfg_v3_openvino["SubModules"][sub_name]
sub_cfg["use_hpip"] = True
sub_cfg["hpi_config"] = {
"auto_config": False,
"backend": "openvino",
"backend_config": {"cpu_num_threads": 8},
"auto_paddle2onnx": True,
}
ocr_v3_hpi = PaddleOCR(
ocr_version="PP-OCRv4",
lang="en",
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
device="cpu",
paddlex_config=cfg_v3_openvino,
text_detection_model_name='PP-OCRv4_mobile_det',
text_recognition_model_name='en_PP-OCRv4_mobile_rec'
)
For PaddleOCR v2 with OpenVINO, the code is longer but inspired by this tutorial: https://docs.openvino.ai/2024/notebooks/paddle-ocr-webcam-with-output.html
For one example image, with the old PaddleOCR version, I have 300ms inference time, with the latest version, it's 691ms.
Without OpenVINO, they have similar inference time ~ 1130ms.
Why OpenVINO is not as good as before? Thank you
As we know, OpenVINO does not support AMD hardware, which means the acceleration on AMD CPUs and GPUs is very limited.
If you are using the AMD Ryzen 5 PRO 4650U with Radeon Graphics, please use ROCm to accelerate inference on the AMD CPU and integrated GPU. For details, refer to: https://github.com/liebedir/PP-OCRv5-AMD-ROCm
Excellent detailed investigation @dienhoa! This is a significant performance regression that deserves attention. Based on your configuration and the AMD hardware involved, here are some technical insights and potential solutions:
Performance Analysis
Root Causes of the Regression
The 2x slowdown (300ms → 691ms) with OpenVINO suggests several possible issues in the v3 architecture changes:
- Model Architecture Changes: PP-OCRv4 models may have architectural differences that don't optimize as well with OpenVINO's current optimization passes
- HPI Integration Overhead: The new PaddleX HPI (Hardware Performance Integration) layer might introduce additional overhead
- OpenVINO Version Mismatch: Different OpenVINO versions between v2 and v3 implementations
Diagnostic Steps
1. Profile the Pipeline Components
import time
# Profile each component separately
def profile_ocr_pipeline(ocr_instance, image_path):
start = time.time()
# Detection timing
det_start = time.time()
det_result = ocr_instance.ocr(image_path, det=True, rec=False)
det_time = time.time() - det_start
# Recognition timing
rec_start = time.time()
rec_result = ocr_instance.ocr(image_path)
rec_time = time.time() - rec_start
total_time = time.time() - start
print(f"Detection: {det_time:.3f}s")
print(f"Recognition: {rec_time:.3f}s")
print(f"Total: {total_time:.3f}s")
return det_time, rec_time, total_time
2. Optimize OpenVINO Configuration
Try these enhanced HPI settings:
# More aggressive OpenVINO optimization
sub_cfg["hpi_config"] = {
"auto_config": False,
"backend": "openvino",
"backend_config": {
"cpu_num_threads": 8,
"precision": "FP16", # Faster on modern CPUs
"enable_dynamic_batch": True,
"performance_hint": "THROUGHPUT", # vs "LATENCY"
"inference_num_threads": 4,
"cpu_bind_thread": True,
},
"auto_paddle2onnx": True,
"enable_profile": True, # Get detailed timing
}
3. Alternative: Direct OpenVINO Optimization
For maximum control, you could bypass the HPI layer:
# Direct OpenVINO model loading (similar to v2 approach)
import openvino.runtime as ov
# Load models directly with specific optimizations
core = ov.Core()
core.set_property("CPU", {"PERFORMANCE_HINT": "THROUGHPUT"})
core.set_property("CPU", {"CPU_THREADS_NUM": "8"})
# Compile with explicit optimization
compiled_model = core.compile_model(model, "CPU")
Hardware-Specific Optimizations for AMD
While @openvino-book correctly mentioned ROCm, there are additional AMD CPU optimizations:
CPU-Specific Flags
# AMD Zen architecture optimizations
os.environ["OMP_NUM_THREADS"] = "8"
os.environ["GOMP_CPU_AFFINITY"] = "0-7" # Bind to physical cores
os.environ["OMP_PROC_BIND"] = "true"
os.environ["OMP_PLACES"] = "cores"
# AMD-optimized BLAS if available
os.environ["OPENBLAS_NUM_THREADS"] = "8"
os.environ["MKL_NUM_THREADS"] = "8" # If Intel MKL available
Benchmarking Methodology
To ensure fair comparison:
# Warm-up inference (important for OpenVINO)
for _ in range(3):
ocr_instance.ocr(image_path)
# Actual benchmark
times = []
for _ in range(10):
start = time.time()
result = ocr_instance.ocr(image_path)
times.append(time.time() - start)
avg_time = np.mean(times[2:]) # Exclude first 2 for warm-up
std_time = np.std(times[2:])
print(f"Average: {avg_time:.3f}s ± {std_time:.3f}s")
Next Steps
- Profile breakdown: Run the component profiling to identify bottleneck
- Try optimized config: Test the enhanced HPI configuration above
- Version comparison: Check OpenVINO versions used in v2 vs v3
- Model comparison: Compare ONNX model sizes and architectures between versions
@changdazhou Would be helpful to know:
- What OpenVINO version does v3 use vs v2?
- Are there known performance optimization guidelines for PP-OCRv4 + OpenVINO?
- Any plans to optimize the HPI integration overhead?
Happy to help with further testing and optimization! This kind of performance regression affects many users migrating from v2 to v3.
#performance #openvino #amd #paddleocr #15846
Thank you, @galafis and @openvino-book, for the help.
When I accelerate PP-OCRv4 models with OpenVINO in PaddleOCR v2, I get a significant speed-up even on my AMD CPU.
When I retried with PaddleOCR v3, even with the same models (PP-OCRv4), it's much slower. I'm not really sure why. Furthermore, the Inference of PaddleOCR is now handled inside PaddleX, which makes it harder to understand how PaddleX compiles and uses OpenVINO to compare with the previous version.
I also tried with an Intel CPU and the result is the same ( Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz ). PaddleOCR v3 with OpenVINO is much slower than v2 with OpenVINO
I believe the most likely reason for this difference is that in PaddleOCR 2.x (which depends on PaddlePaddle 2.x), OpenVINO can directly load the pdmodel format. However, since the model format in PaddlePaddle 3.x has been changed to JSON, OpenVINO can no longer read it directly. As a result, PaddleOCR 3.x (which depends on PaddlePaddle 3.x) first converts the model to ONNX format, and then performs inference with OpenVINO.
To verify this, you can try converting a PaddleOCR 2.x model to ONNX using the Paddle2ONNX tool, and then run inference with OpenVINO to measure the speed.
Ah, thank you. So it means the speed in OpenVINO can be different if the model is in .pdmodel format or ONNX format. Maybe it's faster with .pdmodel?
That's very interesting! Thanks again for pointing
How was this completed/solved?
Unfortunately, I get the same speed in PaddleOCR v2 with ONNX or .pdmodel, both are very fast. Only in PaddleOCR v3 it's slower