onnxruntime_backend icon indicating copy to clipboard operation
onnxruntime_backend copied to clipboard

Add support for sharing an ORT session

Open quic-suppugun opened this issue 1 year ago • 2 comments
trafficstars

For every instance in a model instance group a new ORT session is created. This code adds support to share a session per instance group. This support can be enabled by defining 'share_session_between_instances' to true in triton model config "parameters". Example: parameters [ ..... { key: "share_session_between_instances" value: {string_value: "true"} } ]

This is a global parameter and cannot be defined per instance group. The user should determine if the parameter makes sense for their setup.

When log-info option of tritonserver is set to "1", the logs will indicate that a session is mapped for the instance group during the first initialized instance and reused for other instances. Example: TRITONBACKEND_ModelInstanceInitialize: _0_1 (CPU device 0) TRITONBACKEND_ModelInstanceInitialize: _0_0 (CPU device 0) Could not find a session corresponding to instance group: _0 Created session for instance: _0_1 Mapped session for instance group: _0 Reusing session for instance: _0_0

Change-Id: I6dc509b9c2451e3dd14d45f6f150b37f50b5db89

quic-suppugun avatar Mar 20 '24 23:03 quic-suppugun

I have compiled two images based on this PR for easy use. They are:

  • docker pull jackiexiao/tritonserver:24.03-py3-onnx-share-session
  • docker pull jackiexiao/tritonserver:24.03-onnx-py-cpu-onnx-share-session

The first image only replaces the ONNX backend while keeping everything else unchanged. The second image provides a smaller CPU version.

Jackiexiao avatar Mar 29 '24 13:03 Jackiexiao