seldon-core Add support on Executor for Triton's binary payload which improves performance

trafficstars

Describe the bug

Our clients deployed a triton deployment using SeldonDeployment. their inference pipeline uses tritonclient[http] python library. it seems that seldon-engine doesn't supports binary data (which is sent to the model by default). https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/init.py#L1656

out response from engine:

{
	"status": {
		"code": 500,
		"info": "invalid character '\\x00' after top-level value",
		"status": "FAILURE"
	}
}

To reproduce

define some model which is compatible with triton (torch, onnx etc).
deploy a seldondeployment using TRITON_SERVER implementation.
set an HTTP endpoint to seldon-engine.
use tritonclient python library for inference the model.
use binary_data=True flag in either inputs or outputs

    import tritonclient.http as httpclient

    triton_client = httpclient.InferenceServerClient(url="<SELDON_ENGINE_HTTP_ENDPOINT>")
    
    inputs = []
    outputs = []
    inputs.append(httpclient.InferInput('INPUT0', [1, 16], "INT32"))
    inputs.append(httpclient.InferInput('INPUT1', [1, 16], "INT32"))

    inputs[0].set_data_from_numpy(input0_data) # binary_data=True as deafult 
    inputs[1].set_data_from_numpy(input1_data)

    outputs.append(httpclient.InferRequestedOutput('OUTPUT0')) # binary_data=True as deafult 
    outputs.append(httpclient.InferRequestedOutput('OUTPUT1'))

Expected behaviour

Since tritonclient library is the leading way to connect to the triton-server, we would really appreciate if you can add the support of binary_data (request and response) sent to the triton-server.

Mar 07 '22 09:03 shoval-a

@shoval-a what version of Seldon Core are you using? The protobufs are the same when configuring for the v2 protocol, are you explicitly setting up the protocol in your deployment? If you can share that as well it would help. Finally if indeed this woudl not work with the latest version of the serviceOrchestrator (aka engine/executor), then you can run it with the no-engine label that basically allows you to interact directly with the triton server, and should work as it would with a local version of the triton docker image.

Mar 07 '22 10:03 axsaucedo

@axsaucedo we are using v1.12 and I don't know what it means to explicitly set the protocol so I guess we don't do that.

The no-engine option doesn't help us because we try to move Triton users to Seldon-Triton and using that parameter is basically keeping them as they were before without the Seldon features we want to take advantage of.

Mar 07 '22 11:03 naor2013

@naor2013 we had an internal discussion about this, and it seems that this is specifically a feature that triton implements to introduce optimizations on large payloads, so this would be more of a feature request. We think this is something that we'll want to explore further but there will need to be some further investigation and prioritisation before we confirm further.

Mar 17 '22 10:03 axsaucedo

@axsaucedo Hello, I want to know if this problem can be solved as soon as possible, so we will decide whether to use Seldon in the production environment or not.

Aug 15 '22 16:08 FlyTOmeLight

This works in v2

Dec 19 '22 11:12 ukclivecox

seldon-core seldon-core copied to clipboard

Add support on Executor for Triton's binary payload which improves performance

Describe the bug

To reproduce

Expected behaviour

seldon-core
seldon-core copied to clipboard