onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

TensorRT Provider Vs TensorRT Native

Open hafidh561 opened this issue 2 years ago • 3 comments

Hello, i want ask some questions about benchmark performance and inference time between onnx with TensorRT provider with TensorRT native. Is there onnx will be slow or get worse performance rather than TensorRT native? Because it's first time i use onnx with TensorRT, usually i use CUDA provider. Thanks

hafidh561 avatar Jul 05 '22 10:07 hafidh561

TensorRT EP can achieve performance parity with native TensorRT. One of the benefits to use TensorRT EP is to run models that can't run in native TensorRT if there are TensorRT unsupported ops in the model. Those ops will fallback to other EPs like CUDA or CPU in OnnxRuntime automatically.

stevenlix avatar Jul 05 '22 18:07 stevenlix

TensorRT EP can achieve performance parity with native TensorRT. One of the benefits to use TensorRT EP is to run models that can't run in native TensorRT if there are TensorRT unsupported ops in the model. Those ops will fallback to other EPs like CUDA or CPU in OnnxRuntime automatically.

Is there any benchmark for it?

hafidh561 avatar Jul 06 '22 02:07 hafidh561

Any reproducible benchmark on this?

fxmarty avatar Sep 22 '22 12:09 fxmarty

@pommedeterresautee Maybe you have some insights on this? I read https://github.com/ELS-RD/transformer-deploy/blob/d397869e95ee07570c47edefec01bdc673391b65/docs/faq.md#why-dont-you-support-gpu-quantization-on-onnx-runtime-instead-of-tensorrt , but it's not clear to me why ONNX Runtime + TensorrtExecutionProvider would be worse than Tensor RT native, given that you start from an onnx QDQ model? I had no issue with fp32 and int8 (static) quantized models with the Tensor RT execution provider.

fxmarty avatar Oct 20 '22 12:10 fxmarty