seldon-core
seldon-core copied to clipboard
Add support on Executor for Triton's binary payload which improves performance
Describe the bug
Our clients deployed a triton deployment using SeldonDeployment.
their inference pipeline uses tritonclient[http] python library.
it seems that seldon-engine doesn't supports binary data (which is sent to the model by default).
https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/init.py#L1656
out response from engine:
{
"status": {
"code": 500,
"info": "invalid character '\\x00' after top-level value",
"status": "FAILURE"
}
}
To reproduce
- define some model which is compatible with triton (
torch,onnxetc). - deploy a seldondeployment using TRITON_SERVER implementation.
- set an HTTP endpoint to seldon-engine.
- use
tritonclientpython library for inference the model. - use
binary_data=Trueflag in eitherinputsoroutputs
import tritonclient.http as httpclient
triton_client = httpclient.InferenceServerClient(url="<SELDON_ENGINE_HTTP_ENDPOINT>")
inputs = []
outputs = []
inputs.append(httpclient.InferInput('INPUT0', [1, 16], "INT32"))
inputs.append(httpclient.InferInput('INPUT1', [1, 16], "INT32"))
inputs[0].set_data_from_numpy(input0_data) # binary_data=True as deafult
inputs[1].set_data_from_numpy(input1_data)
outputs.append(httpclient.InferRequestedOutput('OUTPUT0')) # binary_data=True as deafult
outputs.append(httpclient.InferRequestedOutput('OUTPUT1'))
Expected behaviour
Since tritonclient library is the leading way to connect to the triton-server, we would really appreciate if you can add the support of binary_data (request and response) sent to the triton-server.
@shoval-a what version of Seldon Core are you using? The protobufs are the same when configuring for the v2 protocol, are you explicitly setting up the protocol in your deployment? If you can share that as well it would help. Finally if indeed this woudl not work with the latest version of the serviceOrchestrator (aka engine/executor), then you can run it with the no-engine label that basically allows you to interact directly with the triton server, and should work as it would with a local version of the triton docker image.
@axsaucedo we are using v1.12 and I don't know what it means to explicitly set the protocol so I guess we don't do that.
The no-engine option doesn't help us because we try to move Triton users to Seldon-Triton and using that parameter is basically keeping them as they were before without the Seldon features we want to take advantage of.
@naor2013 we had an internal discussion about this, and it seems that this is specifically a feature that triton implements to introduce optimizations on large payloads, so this would be more of a feature request. We think this is something that we'll want to explore further but there will need to be some further investigation and prioritisation before we confirm further.
@axsaucedo Hello, I want to know if this problem can be solved as soon as possible, so we will decide whether to use Seldon in the production environment or not.
This works in v2