modelmesh-serving Triton on-wire compression of request/response on HTTP

Triton on-wire compression of request/response on HTTP

Open andreapairon opened this issue 3 years ago • 0 comments

Is it possible to leverage Triton client (https://github.com/triton-inference-server/client) features like the on-wire compression of request/response on HTTP using the current /infer endpoint? (https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#compression)

If not, will be implemented on future ModelMesh releases?

Jan 24 '22 16:01 andreapairon

modelmesh-serving modelmesh-serving copied to clipboard

Triton on-wire compression of request/response on HTTP

modelmesh-serving
modelmesh-serving copied to clipboard