modelmesh-serving
modelmesh-serving copied to clipboard
Triton on-wire compression of request/response on HTTP
Is it possible to leverage Triton client (https://github.com/triton-inference-server/client) features like the on-wire compression of request/response on HTTP using the current /infer
endpoint? (https://github.com/triton-inference-server/server/blob/main/docs/inference_protocols.md#compression)
If not, will be implemented on future ModelMesh releases?