client icon indicating copy to clipboard operation
client copied to clipboard

add inferRequest capability to python clinet

Open andompesta opened this issue 1 year ago • 0 comments

Hi all,

I was facing an uncommon use-case, but I wonder if other are facing the same issue. I'm using Triton-inference-server inside a sagemaker endpoint and I have enable the logging through the data-capture capability. This enable use to log all the requests and responses sent to/from the endpoint with no additional latency. However, by using the binary_data format for fast requests, the logs have unreadable format:

For example:

{
    "captureData": {
        "endpointInput": {
            "observedContentType": "application/vnd.sagemaker-triton.binary+json;json-header-size=***",
            "mode": "INPUT",
            "data": "eyJpbnB1dHMiOlt7Im5hbWUiOiJhdWN0aW...",
            "encoding": "BASE64"
        },
        "endpointOutput": {
            "observedContentType": "application/vnd.sagemaker-triton.binary+json;json-header-size=***",
            "mode": "OUTPUT",
            "data": "eyJtb2RlbF9uYW1lIjoiY2ZfY2F0YWxvZ19zZ...",
            "encoding": "BASE64"
        }
    },
    "eventMetadata": {
        "eventId": "****",
        "inferenceTime": "2000-01-01T00:00:00Z"
    },
    "eventVersion": "0"
}

To convert this into a meaningful format I have developed the the InferRequest based on the already available InferResult class.

This enable the decoding the the full log by using:

data = json.loads(line)
outputs = data["captureData"]["endpointOutput"]
outputs_header_length = outputs["observedContentType"]
outputs_header_length = int(outputs_header_length[len(content_type_template) :])
outputs_data = base64.b64decode(outputs["data"])
response = InferResult.from_response_body(
   outputs_data,
   header_length=outputs_header_length,
)
      
inputs = data["captureData"]["endpointInput"]
inputs_header_length = inputs["observedContentType"]
inputs_header_length = int(inputs_header_length[len(content_type_template) :])
inputs_data = base64.b64decode(inputs["data"])

request = InferRequest.from_request_body(
    inputs_data,
    header_length=inputs_header_length,
)

andompesta avatar Mar 16 '23 13:03 andompesta