pytriton icon indicating copy to clipboard operation
pytriton copied to clipboard

[Question] Working with custom request schemas

Open kheyer opened this issue 5 months ago • 1 comments

I'm looking to use pytriton to run inference for a model. We have existing systems that send requests with the format:

{
    'embedding' : [...]
}

However triton/pytriton requires the format:

{
    "inputs" : [
        {
            "name" : "embedding",
            "shape" : [1, embedding_size],
            "datatype" : "FP32",
            "data" : [...]
        }
    ]
}

Is there a way to change the input schema of pytriton or parse a custom request in the inference function?

Certainly one workaround would be to set up a FastAPI server that receives requests of the first format, transforms them to pytriton format, sends the request to pytriton and returns the response. However this requires adding another server, another hop, deserializing/serializing the request data, etc.

Is there a solution to this within pytriton that avoids this complexity? For example processing the raw request in the infer function.

For instance the vllm example appears to process a different request schema from the triton generate documentation using an infer function that takes a Request as input, but I'm not sure if that's unique to the generate endpoint.

kheyer avatar Sep 24 '24 19:09 kheyer