server Async HTTP Python Client not working properly

Description

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa6 in position 5: invalid start byte when requesting a model config using an async http client.

When observing the http response directly using tritonclient.http.aio without wrapper, I've noticed that async client does not decompress the http response by itself, so it seems to be fixable with setting auto_decompress=True on aiohttp.ClientSession. I believe in my case the compression is imposed elsewhere (by nginx?).

If that was intended (as the auto_decompress=True is the default setting) then some additional logic is required to process the compressed response (calling to brotli.decompress() fixed it in my case). In any case I'll be happy to provide the necessary fixes.

Triton Information

Triton server version 2.38.0
Triton container 23.09
tritonclient==2.33.0, 2.42.0
nvidia-pytriton==0.2.5, 0.3.0 and 0.5.1 when built from source

To Reproduce

import asyncio
import numpy
from pytriton.client import AsyncioModelClient, ModelClient
from pytriton.client.utils import get_model_config

HOST = "HOST:80"
model_name = "intent_classifier" #does not work
message = "Simple and correct query for testing"

async def run_classification(inferer_server_endpoint: str, clf_model_name: str, message: str, **_):
    async with AsyncioModelClient(inferer_server_endpoint, clf_model_name) as client:
        inference_client = client.create_client_from_url(inferer_server_endpoint)
        config = await inference_client.get_model_config(model_name)
        print(config)

def sync_classification(inferer_server_endpoint: str, clf_model_name: str, message: str, **_):
    with ModelClient(inferer_server_endpoint, clf_model_name) as client:
        inference_client = client.create_client_from_url(inferer_server_endpoint)
        config = inference_client.get_model_config(model_name)
        print(config)

if __name__ == "__main__":
    sync_classification(HOST, model_name, message) # Works
    #asyncio.run(run_classification(HOST, model_name, message)) # Does not

config is attached classifier_config.txt

Expected behavior Model configuration should be returned in a valid json format to be parsed into python dict

Feb 15 '24 09:02 mutkach

Thanks for reporting the issue, I have filed a ticket for us to investigate further.

In any case I'll be happy to provide the necessary fixes.

Any contribution is welcomed!

Feb 15 '24 17:02 kthui

Hi @mutkach, I took a deeper look into the Python AsyncIO client and seems like we already have decompression built in. When calling the async infer(), it will:

read the Content-Encoding header from the response header
pass the Content-Encoding header to InferResult class that reads the response body
the InferResult class will auto-decompress the response body based on the Content-Encoding header set by the server.

I believe in my case the compression is imposed elsewhere (by nginx?).

Would you be able to share the response headers received, when encountering this issue?

Mar 09 '24 02:03 kthui

server server copied to clipboard

Async HTTP Python Client not working properly

server
server copied to clipboard