onnxruntime_backend
onnxruntime_backend copied to clipboard
Request for Supporting minShapes/optShapes/maxShapes for TensorRT
Is your feature request related to a problem? Please describe. The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration with TensorRT optimization. While ONNX Runtime itself supports these parameters for TensorRT (as seen here), their absence in Triton's ONNX Runtime backend limits efficient handling of models with dynamic input shapes.
Describe the solution you'd like I propose adding support for the following parameters directly in Triton's ONNX Runtime backend configuration: trt_profile_min_shapes trt_profile_opt_shapes trt_profile_max_shapes This addition would enable optimized handling of dynamic input sizes within Triton, improving the performance and flexibility of models utilizing TensorRT.
Describe alternatives you've considered Мanually compiling the TensorRT engine with these shape ranges before loading it into Triton. However, this approach is less integrated and flexible compared to having direct support in the Triton configuration.
#217 would solve this.