PaddleOCR PaddleOCR v3 with OpenVINO is much slower than PaddleOCR v2

🔎 Search before asking

[x] I have searched the PaddleOCR Docs and found no similar bug report.
[x] I have searched the PaddleOCR Issues and found no similar bug report.
[x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

Using the same models in PP-OCRv4, I see PaddleOCR v3 is much slower than PaddleOCR v2 with OpenVINO ( >x2 slower )

🏃‍♂️ Environment (运行环境)

OS: Ubuntu 22.04.5 LTS
Environment: Jupyter
Python: 3.11.5
PaddleOCR: 3.2.0 and 2.7.0.3
Install: pip
RAM: 24G
CPU: AMD Ryzen 5 PRO 4650U with Radeon Graphics
CUDA None

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

In the latest version, I use

from paddlex.inference import load_pipeline_config

# Configure PaddleX OCR pipeline to force HPI -> OpenVINO for detector and recognizer
cfg_v3_openvino = load_pipeline_config("OCR")
cfg_v3_openvino["use_doc_preprocessor"] = False
cfg_v3_openvino["use_textline_orientation"] = False
for sub_name in ("TextDetection", "TextRecognition"):
    sub_cfg = cfg_v3_openvino["SubModules"][sub_name]
    sub_cfg["use_hpip"] = True
    sub_cfg["hpi_config"] = {
        "auto_config": False,
        "backend": "openvino",
        "backend_config": {"cpu_num_threads": 8},
        "auto_paddle2onnx": True,
    }

ocr_v3_hpi = PaddleOCR(
    ocr_version="PP-OCRv4",
    lang="en",
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    device="cpu",
    paddlex_config=cfg_v3_openvino,
    text_detection_model_name='PP-OCRv4_mobile_det',
    text_recognition_model_name='en_PP-OCRv4_mobile_rec'
)

For PaddleOCR v2 with OpenVINO, the code is longer but inspired by this tutorial: https://docs.openvino.ai/2024/notebooks/paddle-ocr-webcam-with-output.html

For one example image, with the old PaddleOCR version, I have 300ms inference time, with the latest version, it's 691ms.

Without OpenVINO, they have similar inference time ~ 1130ms.

Why OpenVINO is not as good as before? Thank you

Sep 24 '25 18:09 dienhoa

As we know, OpenVINO does not support AMD hardware, which means the acceleration on AMD CPUs and GPUs is very limited.

If you are using the AMD Ryzen 5 PRO 4650U with Radeon Graphics, please use ROCm to accelerate inference on the AMD CPU and integrated GPU. For details, refer to: https://github.com/liebedir/PP-OCRv5-AMD-ROCm

Sep 25 '25 06:09 openvino-book

Excellent detailed investigation @dienhoa! This is a significant performance regression that deserves attention. Based on your configuration and the AMD hardware involved, here are some technical insights and potential solutions:

Performance Analysis

Root Causes of the Regression

The 2x slowdown (300ms → 691ms) with OpenVINO suggests several possible issues in the v3 architecture changes:

Model Architecture Changes: PP-OCRv4 models may have architectural differences that don't optimize as well with OpenVINO's current optimization passes
HPI Integration Overhead: The new PaddleX HPI (Hardware Performance Integration) layer might introduce additional overhead
OpenVINO Version Mismatch: Different OpenVINO versions between v2 and v3 implementations

Diagnostic Steps

1. Profile the Pipeline Components

import time

# Profile each component separately
def profile_ocr_pipeline(ocr_instance, image_path):
    start = time.time()
    
    # Detection timing
    det_start = time.time()
    det_result = ocr_instance.ocr(image_path, det=True, rec=False)
    det_time = time.time() - det_start
    
    # Recognition timing  
    rec_start = time.time()
    rec_result = ocr_instance.ocr(image_path)
    rec_time = time.time() - rec_start
    
    total_time = time.time() - start
    
    print(f"Detection: {det_time:.3f}s")
    print(f"Recognition: {rec_time:.3f}s")
    print(f"Total: {total_time:.3f}s")
    
    return det_time, rec_time, total_time

2. Optimize OpenVINO Configuration

Try these enhanced HPI settings:

# More aggressive OpenVINO optimization
sub_cfg["hpi_config"] = {
    "auto_config": False,
    "backend": "openvino",
    "backend_config": {
        "cpu_num_threads": 8,
        "precision": "FP16",  # Faster on modern CPUs
        "enable_dynamic_batch": True,
        "performance_hint": "THROUGHPUT",  # vs "LATENCY"
        "inference_num_threads": 4,
        "cpu_bind_thread": True,
    },
    "auto_paddle2onnx": True,
    "enable_profile": True,  # Get detailed timing
}

3. Alternative: Direct OpenVINO Optimization

For maximum control, you could bypass the HPI layer:

# Direct OpenVINO model loading (similar to v2 approach)
import openvino.runtime as ov

# Load models directly with specific optimizations
core = ov.Core()
core.set_property("CPU", {"PERFORMANCE_HINT": "THROUGHPUT"})
core.set_property("CPU", {"CPU_THREADS_NUM": "8"})

# Compile with explicit optimization
compiled_model = core.compile_model(model, "CPU")

Hardware-Specific Optimizations for AMD

While @openvino-book correctly mentioned ROCm, there are additional AMD CPU optimizations:

CPU-Specific Flags

# AMD Zen architecture optimizations
os.environ["OMP_NUM_THREADS"] = "8"
os.environ["GOMP_CPU_AFFINITY"] = "0-7"  # Bind to physical cores
os.environ["OMP_PROC_BIND"] = "true"
os.environ["OMP_PLACES"] = "cores"

# AMD-optimized BLAS if available
os.environ["OPENBLAS_NUM_THREADS"] = "8"
os.environ["MKL_NUM_THREADS"] = "8"  # If Intel MKL available

Benchmarking Methodology

To ensure fair comparison:

# Warm-up inference (important for OpenVINO)
for _ in range(3):
    ocr_instance.ocr(image_path)

# Actual benchmark
times = []
for _ in range(10):
    start = time.time()
    result = ocr_instance.ocr(image_path)
    times.append(time.time() - start)

avg_time = np.mean(times[2:])  # Exclude first 2 for warm-up
std_time = np.std(times[2:])
print(f"Average: {avg_time:.3f}s ± {std_time:.3f}s")

Next Steps

Profile breakdown: Run the component profiling to identify bottleneck
Try optimized config: Test the enhanced HPI configuration above
Version comparison: Check OpenVINO versions used in v2 vs v3
Model comparison: Compare ONNX model sizes and architectures between versions

@changdazhou Would be helpful to know:

What OpenVINO version does v3 use vs v2?
Are there known performance optimization guidelines for PP-OCRv4 + OpenVINO?
Any plans to optimize the HPI integration overhead?

Happy to help with further testing and optimization! This kind of performance regression affects many users migrating from v2 to v3.

#performance #openvino #amd #paddleocr #15846

Sep 26 '25 22:09 galafis

Thank you, @galafis and @openvino-book, for the help.

When I accelerate PP-OCRv4 models with OpenVINO in PaddleOCR v2, I get a significant speed-up even on my AMD CPU.

When I retried with PaddleOCR v3, even with the same models (PP-OCRv4), it's much slower. I'm not really sure why. Furthermore, the Inference of PaddleOCR is now handled inside PaddleX, which makes it harder to understand how PaddleX compiles and uses OpenVINO to compare with the previous version.

I also tried with an Intel CPU and the result is the same ( Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz ). PaddleOCR v3 with OpenVINO is much slower than v2 with OpenVINO

Sep 30 '25 15:09 dienhoa

I believe the most likely reason for this difference is that in PaddleOCR 2.x (which depends on PaddlePaddle 2.x), OpenVINO can directly load the pdmodel format. However, since the model format in PaddlePaddle 3.x has been changed to JSON, OpenVINO can no longer read it directly. As a result, PaddleOCR 3.x (which depends on PaddlePaddle 3.x) first converts the model to ONNX format, and then performs inference with OpenVINO.

To verify this, you can try converting a PaddleOCR 2.x model to ONNX using the Paddle2ONNX tool, and then run inference with OpenVINO to measure the speed.

Oct 20 '25 03:10 Bobholamovic

Ah, thank you. So it means the speed in OpenVINO can be different if the model is in .pdmodel format or ONNX format. Maybe it's faster with .pdmodel?

That's very interesting! Thanks again for pointing

Oct 23 '25 03:10 dienhoa

How was this completed/solved?

Oct 30 '25 14:10 rklec

Unfortunately, I get the same speed in PaddleOCR v2 with ONNX or .pdmodel, both are very fast. Only in PaddleOCR v3 it's slower

Nov 18 '25 14:11 dienhoa