onnxruntime_backend
onnxruntime_backend copied to clipboard
Add support for sharing an ORT session
For every instance in a model instance group a new ORT session is created. This code adds support to share a session per instance group. This support can be enabled by defining 'share_session_between_instances' to true in triton model config "parameters". Example: parameters [ ..... { key: "share_session_between_instances" value: {string_value: "true"} } ]
This is a global parameter and cannot be defined per instance group. The user should determine if the parameter makes sense for their setup.
When log-info option of tritonserver is set to "1", the logs will indicate that a session is mapped for the instance group during the first initialized instance and reused for other instances.
Example:
TRITONBACKEND_ModelInstanceInitialize:
Change-Id: I6dc509b9c2451e3dd14d45f6f150b37f50b5db89
I have compiled two images based on this PR for easy use. They are:
- docker pull jackiexiao/tritonserver:24.03-py3-onnx-share-session
- docker pull jackiexiao/tritonserver:24.03-onnx-py-cpu-onnx-share-session
The first image only replaces the ONNX backend while keeping everything else unchanged. The second image provides a smaller CPU version.