[Performance] Dynamic Shape performance
Describe the issue
- I am using the onnxrutime to make inference on CPU and GPU. The input I use for the model is dynamic shape.
- Whether it is CPU or GPU, onnxruntime inference time on static shape input is shorted than dynamic shape input.
- Is there a way to optimize the inference time of the model in the case of dynamic input?
To reproduce
import numpy as np
import onnxruntime as ort
from tqdm import tqdm
import time
class TestOrtInfer(object):
def __init__(self, onnx_path, batch_size=1, total_samples=1000):
self.onnx_path = onnx_path
self.total_samples = total_samples
self.batch_size = batch_size
self.x = np.random.randn(*[batch_size, 3, 224, 224]).astype(np.float32)
def init_session(self, use_gpu=False):
self.use_gpu = use_gpu
if self.use_gpu:
exproviders = ["CUDAExecutionProvider", "CPUExecutionProvider"]
else:
exproviders = ["CPUExecutionProvider"]
self.ort_session = ort.InferenceSession(self.onnx_path,
providers=exproviders)
self.input_name = self.ort_session.get_inputs()[0].name
self.output_name = self.ort_session.get_outputs()[0].name
def infer(self, is_dynamic=False):
latency = []
print('Number of runs:', self.total_samples)
for i in tqdm(range(self.total_samples)):
if is_dynamic:
w = np.random.randint(128, 1024)
w = int(round(w / 32) * 32)
h = np.random.randint(128, 1024)
h = int(round(h / 32) * 32)
else:
h, w = 576, 576
self.x = np.random.randn(*[self.batch_size, 3, h, w]).astype(np.float32)
t0 = time.time()
self.ort_session.run(None, {self.input_name: self.x})
latency.append(time.time() - t0)
avg_time = sum(latency) * 1000 / len(latency)
device = 'GPU' if self.use_gpu else 'CPU'
print(f"Average onnxruntime {device} " \
f"Inference time = {avg_time:.2f} ms")
onnx_path = 'OCRv3_det_infer.onnx'
tester = TestOrtInfer(onnx_path, batch_size=1, total_samples=100)
# CPU Inference
tester.init_session(use_gpu=False)
tester.infer(is_dynamic=False)
tester.infer(is_dynamic=True)
# GPU Inference
tester.init_session(use_gpu=True)
tester.infer(is_dynamic=False)
tester.infer(is_dynamic=True)
-
The result:
Device Model Input shape Loops Average cost CPU OCRv3_det_infer.onnx 1x3x576x576 100 283.94ms CPU OCRv3_det_infer.onnx 1x3xHxW dynamic 100 321.17ms GPU OCRv3_det_infer.onnx 1x3x576x576 100 11.71ms GPU OCRv3_det_infer.onnx 1x3xHxW dynamic 100 445.36ms
Urgency
No response
Platform
Linux
OS Version
Ubuntu
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.12.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.2
Model File
Is this a quantized model?
No
You can try to run the same shape 10 times and discard the time from the first run. Your number should be comparable to the static ones. If you keep changing the shape for each run, a lot of cached data will be invalidated and rebuilt.
I have the same problems when dealing with recognition model of PaddleOCR, due to dynamic shapes [-1, 3, 48, -1]. My suggestion is to warmup the model before doing inference, here snippet of model warmups
...
def model_warmup(self, batch_size: int = 1, min_size: int = 300, max_size: int = 1500):
"""
Recognition model have input size: [-1, 3, 48, -1]
ONNXRuntime with CUDA support is not performing well with arbitrary input size
So we need to warmup the model with arbitrary input size
"""
log.info("Warming up model...")
for i in tqdm(range(min_size, max_size), desc="Warming up model"):
dummy_input = np.random.randn(batch_size, 3, 48, i).astype(np.float32)
self.recog_session.run([self.recog_output_name], {self.recog_input_name: dummy_input})
log.info("Model warmup completed")
...
You can try to run the same shape 10 times and discard the time from the first run. Your number should be comparable to the static ones. If you keep changing the shape for each run, a lot of cached data will be invalidated and rebuilt.
@ytaous How many cache will ORT preserve for each model? For example, I have one model and inference 10 times with different input shapes. Will ORT preserve the last one shape cache, or last N shape cache? Or This is decided by another algorithm?
Any updates on this issue?
Any updates?
keep active
keep active
Applying stale label due to no activity in 30 days
any update?
Applying stale label due to no activity in 30 days