onnxruntime TensorRT Provider Vs TensorRT Native

TensorRT Provider Vs TensorRT Native

Open hafidh561 opened this issue 2 years ago • 3 comments

Hello, i want ask some questions about benchmark performance and inference time between onnx with TensorRT provider with TensorRT native. Is there onnx will be slow or get worse performance rather than TensorRT native? Because it's first time i use onnx with TensorRT, usually i use CUDA provider. Thanks

Jul 05 '22 10:07 hafidh561

TensorRT EP can achieve performance parity with native TensorRT. One of the benefits to use TensorRT EP is to run models that can't run in native TensorRT if there are TensorRT unsupported ops in the model. Those ops will fallback to other EPs like CUDA or CPU in OnnxRuntime automatically.

Jul 05 '22 18:07 stevenlix

TensorRT EP can achieve performance parity with native TensorRT. One of the benefits to use TensorRT EP is to run models that can't run in native TensorRT if there are TensorRT unsupported ops in the model. Those ops will fallback to other EPs like CUDA or CPU in OnnxRuntime automatically.

Is there any benchmark for it?

Jul 06 '22 02:07 hafidh561

Any reproducible benchmark on this?

Sep 22 '22 12:09 fxmarty

@pommedeterresautee Maybe you have some insights on this? I read https://github.com/ELS-RD/transformer-deploy/blob/d397869e95ee07570c47edefec01bdc673391b65/docs/faq.md#why-dont-you-support-gpu-quantization-on-onnx-runtime-instead-of-tensorrt , but it's not clear to me why ONNX Runtime + TensorrtExecutionProvider would be worse than Tensor RT native, given that you start from an onnx QDQ model? I had no issue with fp32 and int8 (static) quantized models with the Tensor RT execution provider.

Oct 20 '22 12:10 fxmarty

onnxruntime onnxruntime copied to clipboard

TensorRT Provider Vs TensorRT Native

onnxruntime
onnxruntime copied to clipboard