client
client copied to clipboard
add inferRequest capability to python clinet
Hi all,
I was facing an uncommon use-case, but I wonder if other are facing the same issue. I'm using Triton-inference-server inside a sagemaker endpoint and I have enable the logging through the data-capture capability. This enable use to log all the requests and responses sent to/from the endpoint with no additional latency. However, by using the binary_data format for fast requests, the logs have unreadable format:
For example:
{
"captureData": {
"endpointInput": {
"observedContentType": "application/vnd.sagemaker-triton.binary+json;json-header-size=***",
"mode": "INPUT",
"data": "eyJpbnB1dHMiOlt7Im5hbWUiOiJhdWN0aW...",
"encoding": "BASE64"
},
"endpointOutput": {
"observedContentType": "application/vnd.sagemaker-triton.binary+json;json-header-size=***",
"mode": "OUTPUT",
"data": "eyJtb2RlbF9uYW1lIjoiY2ZfY2F0YWxvZ19zZ...",
"encoding": "BASE64"
}
},
"eventMetadata": {
"eventId": "****",
"inferenceTime": "2000-01-01T00:00:00Z"
},
"eventVersion": "0"
}
To convert this into a meaningful format I have developed the the InferRequest based on the already available InferResult class.
This enable the decoding the the full log by using:
data = json.loads(line)
outputs = data["captureData"]["endpointOutput"]
outputs_header_length = outputs["observedContentType"]
outputs_header_length = int(outputs_header_length[len(content_type_template) :])
outputs_data = base64.b64decode(outputs["data"])
response = InferResult.from_response_body(
outputs_data,
header_length=outputs_header_length,
)
inputs = data["captureData"]["endpointInput"]
inputs_header_length = inputs["observedContentType"]
inputs_header_length = int(inputs_header_length[len(content_type_template) :])
inputs_data = base64.b64decode(inputs["data"])
request = InferRequest.from_request_body(
inputs_data,
header_length=inputs_header_length,
)