server
server copied to clipboard
Add support for loading onnx files with the tensorRT backend
Describe the solution you'd like Be able to just use an onnx file in the tensorrt backend given that tensorrt has a onnx parser. It would build the engine on warmup and cache it.
Describe alternatives you've considered Using the onnxruntime backend instead, but it has problems https://github.com/triton-inference-server/server/issues/4587 and https://github.com/microsoft/onnxruntime/issues/11356
I think what you want is already implemented : https://github.com/triton-inference-server/server/blob/main/docs/optimization.md#onnx-with-tensorrt-optimization-ort-trt
@joihn ORT is short for onnxruntime, what I'm asking is to use tensorrt with onnxs but without onnxruntime
+1
@tanmayv25 thoughts?
I would not like to complicate TensorRT backend to consume onnx files and own the conversion within TensorRT backend. ORT already support TRT execution provider.
Closing this issue due to lack of activity. If this issue needs follow-up, please let us know and we can reopen it for you.