seldon-core icon indicating copy to clipboard operation
seldon-core copied to clipboard

Add support on Executor for Triton's binary payload which improves performance

Open shoval-a opened this issue 3 years ago • 4 comments
trafficstars

Describe the bug

Our clients deployed a triton deployment using SeldonDeployment. their inference pipeline uses tritonclient[http] python library. it seems that seldon-engine doesn't supports binary data (which is sent to the model by default). https://github.com/triton-inference-server/client/blob/main/src/python/library/tritonclient/http/init.py#L1656

out response from engine:

{
	"status": {
		"code": 500,
		"info": "invalid character '\\x00' after top-level value",
		"status": "FAILURE"
	}
}

To reproduce

  1. define some model which is compatible with triton (torch, onnx etc).
  2. deploy a seldondeployment using TRITON_SERVER implementation.
  3. set an HTTP endpoint to seldon-engine.
  4. use tritonclient python library for inference the model.
  5. use binary_data=True flag in either inputs or outputs
    import tritonclient.http as httpclient

    triton_client = httpclient.InferenceServerClient(url="<SELDON_ENGINE_HTTP_ENDPOINT>")
    
    inputs = []
    outputs = []
    inputs.append(httpclient.InferInput('INPUT0', [1, 16], "INT32"))
    inputs.append(httpclient.InferInput('INPUT1', [1, 16], "INT32"))

    inputs[0].set_data_from_numpy(input0_data) # binary_data=True as deafult 
    inputs[1].set_data_from_numpy(input1_data)

    outputs.append(httpclient.InferRequestedOutput('OUTPUT0')) # binary_data=True as deafult 
    outputs.append(httpclient.InferRequestedOutput('OUTPUT1')) 

Expected behaviour

Since tritonclient library is the leading way to connect to the triton-server, we would really appreciate if you can add the support of binary_data (request and response) sent to the triton-server.

shoval-a avatar Mar 07 '22 09:03 shoval-a

@shoval-a what version of Seldon Core are you using? The protobufs are the same when configuring for the v2 protocol, are you explicitly setting up the protocol in your deployment? If you can share that as well it would help. Finally if indeed this woudl not work with the latest version of the serviceOrchestrator (aka engine/executor), then you can run it with the no-engine label that basically allows you to interact directly with the triton server, and should work as it would with a local version of the triton docker image.

axsaucedo avatar Mar 07 '22 10:03 axsaucedo

@axsaucedo we are using v1.12 and I don't know what it means to explicitly set the protocol so I guess we don't do that.

The no-engine option doesn't help us because we try to move Triton users to Seldon-Triton and using that parameter is basically keeping them as they were before without the Seldon features we want to take advantage of.

naor2013 avatar Mar 07 '22 11:03 naor2013

@naor2013 we had an internal discussion about this, and it seems that this is specifically a feature that triton implements to introduce optimizations on large payloads, so this would be more of a feature request. We think this is something that we'll want to explore further but there will need to be some further investigation and prioritisation before we confirm further.

axsaucedo avatar Mar 17 '22 10:03 axsaucedo

@axsaucedo Hello, I want to know if this problem can be solved as soon as possible, so we will decide whether to use Seldon in the production environment or not.

FlyTOmeLight avatar Aug 15 '22 16:08 FlyTOmeLight

This works in v2

ukclivecox avatar Dec 19 '22 11:12 ukclivecox