How to send the byte or string data in array in perf analyzer
Triton inference server:r24.07 and model_analyzer:1.42.0 config.pbtxt
backend: "python"
max_batch_size: 32
input [
{
name: "IN0"
data_type: TYPE_STRING
dims: [ 16 ]
}
]
output [
{
name: "OUT0"
data_type: TYPE_FP64
dims: [ 1 ]
}
]
instance_group [
{
count:1
kind: KIND_CPU
}
]
dynamic_batching {
max_queue_delay_microseconds: 2500
}
Tried with inference:
curl -v -X POST http://x.xx.xx.xx:80xx/v2/models/model_name/infer -H "Content-Type: application/json" -d '{
> "inputs": [
> {
> "name": "IN0",
> "shape": [1, 16],
> "datatype": "BYTES",
> "data": [
> ["0", "0", "2002", "9", "9", "9", "40", "19", "65.5", "Swipe Transaction", "-3345936507911876459", "La Verne", "CA", "91750", "7538", "Technical Glitch"]
> ]
> }
> ],
> "outputs": [
> {
> "name": "OUT0"
> }
> ]
> }'
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 117
<
{"model_name":"model_name","model_version":"1","outputs":[{"name":"OUT0","datatype":"FP64","shape":[1],"data":[0.0]}]}
But when passing to perf analyzer as --input-data input.json where json looks like:
{
"data": [
{
"IN0": {
"content": [
["17", "2", "2007", "6", "30", "16", "15", "0", "5.4", "Swipe Transaction", "-6571010470072147219", "Bloomville", "OH", "44818", "5499", "Bad PIN"]
],
"shape": [1,16],
"datatype": "BYTES"
}
}
]
}
Getting error: Thread [0] had error: [request id: <id_unknown>] expected 16 string elements for inference input 'IN0', got 1 or error: Failed to init manager inputs: unable to find string data in json.
How need to pass string data?
hi @Kanupriyagoyal try this:
{
"data": [
{
"IN0": {
"content": ["17", "2", "2007", "6", "30", "16", "15", "0", "5.4", "Swipe Transaction", "-6571010470072147219", "Bloomville", "OH", "44818", "5499", "Bad PIN"],
"shape": [16]
}
}
]
}
@nv-hwoo i tried
I0822 04:24:42.997591 111595 infer_handler.cc:975] "[request id: <id_unknown>] Infer failed: [request id: <id_unknown>] expected 16 string elements for inference input 'IN0', got 1"
I0822 04:24:42.997662 111595 infer_handler.h:1311] "Received notification for ModelInferHandler, 0"
I0822 04:24:42.997667 111595 infer_handler.cc:728] "Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE"
I0822 04:24:42.997685 111595 infer_handler.cc:728] "Process for ModelInferHandler, rpc_ok=1, 0 step FINISH"
input_suggested.json
{
"data": [
{
"IN0": {
"content": ["17", "2", "2007", "6", "30", "16", "15", "0", "5.4", "Swipe Transaction", "-6571010470072147219", "Bloomville", "OH", "44818", "5499", "Bad PIN"],
"shape": [16]
}
}
]
}
perf_analyzer -m xgb_model --service-kind=triton --model-repository=/models -b 1 -u localhost:8001 -i grpc -f xgb_model.csv --verbose-csv --concurrency-range 1 --measurement-mode count_windows --input-tensor-format json --input-data input_suggested.json --collect-metrics --metrics-url http://localhost:8002/metrics --metrics-interval 1000
Successfully read data for 1 stream/streams with 1 step/steps.
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "count_windows" mode for stabilization
Stabilizing using average latency and throughput
Minimum number of samples in each window: 50
Using synchronous calls for inference
Request concurrency: 1
Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests.
Thread [0] had error: [request id: <id_unknown>] expected 16 string elements for inference input 'IN0', got 1
@nv-hwoo @Kanupriyagoyal
After some analysis, I identified that when we send the JSON input through HTTP to the perf_analyzer, it is interpreting the input format as binary by 'default'. The http_server.cc file in Triton contains specific logic to handle binary and byte data separately.
To resolve this, explicitly specify that the input format is JSON by using the following option:
--input-tensor-format json
This worked for me when my input is http - json and my count issue is resolved.
(make sure endianness of bytes handled well too)