server Add support for loading onnx files with the tensorRT backend

Add support for loading onnx files with the tensorRT backend

Open fran6co opened this issue 2 years ago • 3 comments

Describe the solution you'd like Be able to just use an onnx file in the tensorrt backend given that tensorrt has a onnx parser. It would build the engine on warmup and cache it.

Describe alternatives you've considered Using the onnxruntime backend instead, but it has problems https://github.com/triton-inference-server/server/issues/4587 and https://github.com/microsoft/onnxruntime/issues/11356

Jul 06 '22 09:07 fran6co

I think what you want is already implemented : https://github.com/triton-inference-server/server/blob/main/docs/optimization.md#onnx-with-tensorrt-optimization-ort-trt

Jul 06 '22 14:07 joihn

@joihn ORT is short for onnxruntime, what I'm asking is to use tensorrt with onnxs but without onnxruntime

Jul 06 '22 14:07 fran6co

Aug 26 '22 16:08 yaysummeriscoming

@tanmayv25 thoughts?

Nov 22 '22 03:11 jbkyang-nvi

I would not like to complicate TensorRT backend to consume onnx files and own the conversion within TensorRT backend. ORT already support TRT execution provider.

Dec 14 '22 15:12 tanmayv25

Closing this issue due to lack of activity. If this issue needs follow-up, please let us know and we can reopen it for you.

Jan 28 '23 00:01 jbkyang-nvi

server server copied to clipboard

Add support for loading onnx files with the tensorRT backend

server
server copied to clipboard