server
server copied to clipboard
Parallel model inferencing flakey after upgrading triton
Description I am upgrading triton from version 21.05 to 22.06 and it seems that parallel model inferencing is now flakey. Failures like this seem to randomly occur:
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
My inferencing workload was previously working, but seems to break with the latest version of triton. It's also confusing that the message states a certain size is expected even though the model config has flexible dims
Triton Information What version of Triton are you using? 22.06
Are you using the Triton container or did you build it yourself? Triton container with:
- some additional pip dependencies installed
- built python backend for the args['model_repository'] fix from a separate issue.
To Reproduce I'm running the following script:
import logging
from concurrent import futures
from tritonclient.grpc import InferenceServerClient, InferInput, InferRequestedOutput
import numpy as np
logging.getLogger().setLevel(logging.INFO)
triton_client = InferenceServerClient('127.0.0.1:8001')
def run():
for i in range(10):
print(i)
try:
inputs = []
input_array = np.random.randint(0, 255, (8, 256, 384, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
inputs = []
input_array = np.random.randint(0, 255, (6, 256, 336, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
inputs = []
input_array = np.random.randint(0, 255,(2, 384, 256, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
inputs = []
input_array = np.random.randint(0, 255,(6, 256, 256, 3), dtype=np.uint8)
model_input = InferInput('images', input_array.shape, 'UINT8')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('softmax')])
except Exception as e:
logging.info(e)
with futures.ThreadPoolExecutor(4) as pool:
futures = []
for i in range(4):
futures.append(pool.submit(run))
for f in futures:
f.result()
and seeing this output:
0
0
0
0
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
1
1
1
1
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
2
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
3
2
2
3
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
5
3
3
4
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
6
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
7
4
4
5
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
5
5
9
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2727936 for input 'images', expecting 2359296
6
6
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
7
7
9
8
8
9
9
though it changes every run. When I use less than 4 threads I don't see the issue, but when using 4 or more threads it will fail at least once.
My model config:
name: "my_model"
platform: "tensorflow_savedmodel"
max_batch_size: 16
input {
name: "images"
data_type: TYPE_UINT8
format: FORMAT_NHWC
dims: -1
dims: -1
dims: 3
}
output {
name: "features"
data_type: TYPE_FP32
dims: 1
dims: 1
dims: 1024
}
output {
name: "softmax"
data_type: TYPE_FP32
dims: 1
dims: 1
dims: 11043
label_filename: "labels.txt"
}
dynamic_batching {
preferred_batch_size: 4
max_queue_delay_microseconds: 500
}
Expected behavior I expect the inference to work on every request and not fail, as it previously did.
That is certainly strange, the error messages do look repetitive (getting same wrong byte size for the same expected byte size). Can you run the server with verbose logging and attach the log as well? It would also be helpful if the request id is specified so the client log can print out which infer fails.
Here are the server logs:
I0801 13:59:29.354984 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:29.355063 1 infer_request.cc:710] prepared: [0x0x7f382c001110] request id: Thread Num: 0, Iter: 0, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f38240f87a8] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f38240f87a8] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:29.355320 1 tensorflow.cc:2430] model my_model, instance my_model, executing 1 requests
I0801 13:59:29.355367 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 1 requests
I0801 13:59:29.355930 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:29.378718 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:29.378784 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:29.378834 1 infer_request.cc:710] prepared: [0x0x7f382d36e830] request id: Thread Num: 1, Iter: 0, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382d36eb98] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f382d36eb98] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:29.395356 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:29.395410 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:29.395461 1 infer_request.cc:710] prepared: [0x0x7f3824001110] request id: Thread Num: 2, Iter: 0, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3826baf948] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f3826baf948] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:29.408964 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:29.409021 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:29.409068 1 infer_request.cc:710] prepared: [0x0x7f382d380e20] request id: Thread Num: 3, Iter: 0, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382d381188] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f382d381188] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:29.862508 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1044 step START
I0801 13:59:29.862563 1 grpc_server.cc:225] Ready for RPC 'ServerLive', 1045
I0801 13:59:29.862695 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1044 step COMPLETE
I0801 13:59:29.862712 1 grpc_server.cc:411] Done for ServerLive, 1044
I0801 13:59:30.863553 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1045 step START
I0801 13:59:30.863619 1 grpc_server.cc:225] Ready for RPC 'ServerLive', 1046
I0801 13:59:30.863767 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1045 step COMPLETE
I0801 13:59:30.863783 1 grpc_server.cc:411] Done for ServerLive, 1045
I0801 13:59:31.865621 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1046 step START
I0801 13:59:31.865703 1 grpc_server.cc:225] Ready for RPC 'ServerLive', 1047
I0801 13:59:31.865940 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1046 step COMPLETE
I0801 13:59:31.865965 1 grpc_server.cc:411] Done for ServerLive, 1046
I0801 13:59:32.039443 1 http_server.cc:3203] HTTP request: 0 /v2/health/ready
I0801 13:59:32.134106 1 http_server.cc:3203] HTTP request: 0 /v2/health/live
I0801 13:59:32.520712 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:32.520828 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d3c6ce0
I0801 13:59:32.520894 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:32.520908 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.520927 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d3c6ce0
I0801 13:59:32.521286 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.521306 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 1 requestsI0801 13:59:32.521301 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.521337 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.521369 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:32.521379 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:32.522058 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:32.547130 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.547171 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.547210 1 infer_request.cc:710] prepared: [0x0x7f3826eabc70] request id: Thread Num: 0, Iter: 0, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3826b81458] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f3826b81458] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.684467 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:32.684602 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d41d150
I0801 13:59:32.684665 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:32.684746 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d4735c0
I0801 13:59:32.684796 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:32.684811 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.684837 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d41d150
I0801 13:59:32.685309 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.685330 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d4735c0
I0801 13:59:32.685335 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.685358 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.685612 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.685631 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.685637 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.685649 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:32.685655 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.685713 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:32.685727 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:32.686336 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:32.699081 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.699141 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.699256 1 infer_request.cc:710] prepared: [0x0x7f382c026a50] request id: Thread Num: 1, Iter: 0, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c0c1b58] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f382c0c1b58] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.710914 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.710971 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.711030 1 infer_request.cc:710] prepared: [0x0x7f382c01acc0] request id: Thread Num: 2, Iter: 0, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f378035fef8] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f378035fef8] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.824167 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:32.824312 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d7785a0
I0801 13:59:32.824387 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:32.824468 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382ef73f70
I0801 13:59:32.824519 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:32.824532 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.824551 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d7785a0
I0801 13:59:32.824960 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.824978 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382ef73f70
I0801 13:59:32.824994 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.825024 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.825170 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.825186 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.825194 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.825200 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:32.825220 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.825251 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:32.825263 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:32.825749 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:32.841311 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.841373 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.841435 1 infer_request.cc:710] prepared: [0x0x7f3826eabc70] request id: Thread Num: 3, Iter: 0, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3827114988] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f3827114988] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.843217 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.843246 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.843273 1 infer_request.cc:710] prepared: [0x0x7f382d36d8c0] request id: Thread Num: 0, Iter: 0, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382d3ab808] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f382d3ab808] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.867758 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1047 step START
I0801 13:59:32.867800 1 grpc_server.cc:225] Ready for RPC 'ServerLive', 1048
I0801 13:59:32.867938 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1047 step COMPLETE
I0801 13:59:32.867989 1 grpc_server.cc:411] Done for ServerLive, 1047
I0801 13:59:32.921268 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:32.921394 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d7cea10
I0801 13:59:32.921445 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:32.921526 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d80f570
I0801 13:59:32.921573 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:32.921587 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.921606 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d7cea10
I0801 13:59:32.921954 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.921972 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d80f570
I0801 13:59:32.921995 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.922018 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.922160 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.922179 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.922183 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.922193 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:32.922194 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.922256 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:32.922268 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:32.922514 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.922606 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:32.922628 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.922657 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.926624 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.926673 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.926764 1 infer_request.cc:710] prepared: [0x0x7f382c026a50] request id: Thread Num: 1, Iter: 0, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382d381428] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f382d381428] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.930596 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.930629 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.930664 1 infer_request.cc:710] prepared: [0x0x7f3793589960] request id: Thread Num: 2, Iter: 0, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f38240f8908] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f38240f8908] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.949829 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.949872 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.949913 1 infer_request.cc:710] prepared: [0x0x7f38244cd4d0] request id: Thread Num: 0, Iter: 1, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f38244cd808] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f38244cd808] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:32.991390 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:32.991480 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d4735c0
I0801 13:59:32.991534 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:32.991546 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:32.991567 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d4735c0
I0801 13:59:32.991925 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.991941 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:32.991939 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:32.991998 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:32.992006 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:32.992039 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:32.992053 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:32.992483 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:32.997894 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:32.997931 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:32.997968 1 infer_request.cc:710] prepared: [0x0x7f3826eabc70] request id: Thread Num: 3, Iter: 0, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3793589568] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f3793589568] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.035820 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [2,1,1,11043]
I0801 13:59:33.035883 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 88344, addr: 0x7f382d41d150
I0801 13:59:33.035911 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [2,1,1,11043]
I0801 13:59:33.035944 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 88344, addr: 0x7f382ef73f70
I0801 13:59:33.036004 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.036017 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.036035 1 grpc_server.cc:2712] GRPC free: size 88344, addr 0x7f382d41d150
I0801 13:59:33.036268 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.036284 1 grpc_server.cc:2712] GRPC free: size 88344, addr 0x7f382ef73f70
I0801 13:59:33.036287 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.036307 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.036398 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.036413 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.036417 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.036427 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.036428 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.036467 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.036477 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.037073 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.044950 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.045013 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.052069 1 infer_request.cc:710] prepared: [0x0x7f382d36d8c0] request id: Thread Num: 1, Iter: 0, Fourth, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
override inputs:
inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.053370 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.053411 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.053442 1 infer_request.cc:710] prepared: [0x0x7f382403eac0] request id: Thread Num: 2, Iter: 0, Fourth, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3793589428] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
override inputs:
inputs:
[0x0x7f3793589428] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.140703 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:33.140839 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d8500d0
I0801 13:59:33.140898 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [2,1,1,11043]
I0801 13:59:33.140926 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 88344, addr: 0x7f382d3c6ce0
I0801 13:59:33.140949 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.140959 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.140978 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d8500d0
I0801 13:59:33.141401 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.141418 1 grpc_server.cc:2712] GRPC free: size 88344, addr 0x7f382d3c6ce0
I0801 13:59:33.141430 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.141455 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.141582 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.141588 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.141598 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.141601 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.141615 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.141653 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.141663 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.142041 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.154173 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.154229 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.154273 1 infer_request.cc:710] prepared: [0x0x7f382d380e20] request id: Thread Num: 0, Iter: 1, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382d36d768] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f382d36d768] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.162141 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.162206 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.162419 1 infer_request.cc:710] prepared: [0x0x7f382d37feb0] request id: Thread Num: 3, Iter: 0, Fourth, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f38244e1b78] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
override inputs:
inputs:
[0x0x7f38244e1b78] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.211626 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.211745 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d80f570
I0801 13:59:33.211801 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.211861 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d4735c0
I0801 13:59:33.211910 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.211923 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.211942 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d80f570
I0801 13:59:33.212285 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.212305 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d4735c0
I0801 13:59:33.212324 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.212347 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.212551 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.212573 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.212604 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.212608 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.212629 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.212678 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.212694 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.213136 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.230853 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.230911 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.230955 1 infer_request.cc:710] prepared: [0x0x7f379363a170] request id: Thread Num: 1, Iter: 1, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f379363a4a8] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f379363a4a8] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.247538 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.247600 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.247676 1 infer_request.cc:710] prepared: [0x0x7f3793588650] request id: Thread Num: 2, Iter: 1, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.308726 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.308863 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d7cea10
I0801 13:59:33.308931 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.309019 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382ef73f70
I0801 13:59:33.309075 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.309095 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.309120 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d7cea10
I0801 13:59:33.309550 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.309565 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.309585 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.309596 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382ef73f70
I0801 13:59:33.309836 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.309841 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.309866 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.309869 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.309891 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.309937 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.309953 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.310720 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.320459 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.320520 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.320613 1 infer_request.cc:710] prepared: [0x0x7f382d37feb0] request id: Thread Num: 0, Iter: 1, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f378035fd88] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f378035fd88] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.336410 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.336460 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.336511 1 infer_request.cc:710] prepared: [0x0x7f382c01acc0] request id: Thread Num: 3, Iter: 1, First, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 8, priority: 0, timeout (us): 0
original inputs:
[0x0x7f37935c98c8] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
override inputs:
inputs:
[0x0x7f37935c98c8] input: images, type: UINT8, original shape: [8,256,384,3], batch + shape: [8,256,384,3], shape: [256,384,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.456399 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:33.456538 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d7785a0
I0801 13:59:33.456602 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:33.456676 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d3c6ce0
I0801 13:59:33.456734 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.456747 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.456765 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d7785a0
I0801 13:59:33.457224 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.457248 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d3c6ce0
I0801 13:59:33.457244 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.457275 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.457486 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.457493 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.457510 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.457511 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.457527 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.457565 1 tensorflow.cc:2430] model my_model, instance my_model, executing 1 requests
I0801 13:59:33.457575 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 1 requests
I0801 13:59:33.457732 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.466131 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.466171 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.466218 1 infer_request.cc:710] prepared: [0x0x7f382c026a50] request id: Thread Num: 1, Iter: 1, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.474220 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.474277 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.474342 1 infer_request.cc:710] prepared: [0x0x7f382d567cc0] request id: Thread Num: 2, Iter: 1, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c1184f8] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f382c1184f8] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.495876 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [2,1,1,11043]
I0801 13:59:33.495979 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 88344, addr: 0x7f382d8500d0
I0801 13:59:33.496013 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.496027 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.496053 1 grpc_server.cc:2712] GRPC free: size 88344, addr 0x7f382d8500d0
I0801 13:59:33.496359 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.496382 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETEI0801 13:59:33.496391 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 1 requests
I0801 13:59:33.496455 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.496467 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.497082 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.497389 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.516322 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.516474 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.516658 1 infer_request.cc:710] prepared: [0x0x7f38244cd4d0] request id: Thread Num: 0, Iter: 1, Fourth, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382401c778] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
override inputs:
inputs:
[0x0x7f382401c778] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.655503 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [8,1,1,11043]
I0801 13:59:33.655805 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 353376, addr: 0x7f382d4735c0
I0801 13:59:33.655869 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.656475 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d80f570
I0801 13:59:33.656539 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.656553 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.656573 1 grpc_server.cc:2712] GRPC free: size 353376, addr 0x7f382d4735c0
I0801 13:59:33.657038 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.657070 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d80f570
I0801 13:59:33.657117 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.657152 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.657456 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.657469 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.657476 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.657491 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.657512 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.657552 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.657563 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.658817 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.674791 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.674845 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.674910 1 infer_request.cc:710] prepared: [0x0x7f382c027200] request id: Thread Num: 1, Iter: 1, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3826eabcc8] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f3826eabcc8] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.675642 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.675695 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.675740 1 infer_request.cc:710] prepared: [0x0x7f382d37feb0] request id: Thread Num: 3, Iter: 1, Second, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
override inputs:
inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [6,256,336,3], batch + shape: [6,256,336,3], shape: [256,336,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.765199 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.765762 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382ef73f70
I0801 13:59:33.765817 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.766032 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d3c6ce0
I0801 13:59:33.766081 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.766094 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.766119 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382ef73f70
I0801 13:59:33.766537 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.766555 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d3c6ce0
I0801 13:59:33.766583 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.766624 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.766774 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.766794 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.766810 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.766837 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.766846 1 tensorflow.cc:2430] model my_model, instance my_model, executing 1 requests
I0801 13:59:33.766862 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 1 requests
I0801 13:59:33.766868 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.767070 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.773161 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.773365 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.773456 1 infer_request.cc:710] prepared: [0x0x7f382c01acc0] request id: Thread Num: 2, Iter: 1, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f378035fd88] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f378035fd88] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.796758 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [2,1,1,11043]
I0801 13:59:33.796995 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 88344, addr: 0x7f382d7cea10
I0801 13:59:33.797024 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.797036 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.797055 1 grpc_server.cc:2712] GRPC free: size 88344, addr 0x7f382d7cea10
I0801 13:59:33.797528 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.797551 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 1 requests
I0801 13:59:33.797574 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.797593 1 tensorflow.cc:2430] model my_model, instance my_model, executing 2 requests
I0801 13:59:33.797596 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.797625 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 2 requests
I0801 13:59:33.800684 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.800832 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.800881 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.800900 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.810162 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.810218 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.810275 1 infer_request.cc:710] prepared: [0x0x7f38240275a0] request id: Thread Num: 1, Iter: 1, Fourth, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3826eabcc8] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
override inputs:
inputs:
[0x0x7f3826eabcc8] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.860404 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.860962 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d7785a0
I0801 13:59:33.861014 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.861029 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.861048 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d7785a0
I0801 13:59:33.861405 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.861424 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.861437 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 2 requests
I0801 13:59:33.861458 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.861472 1 tensorflow.cc:2430] model my_model, instance my_model, executing 1 requestsI0801 13:59:33.861502 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.861533 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 1 requests
I0801 13:59:33.861747 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.868226 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.868288 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.868351 1 infer_request.cc:710] prepared: [0x0x7f37935c9b00] request id: Thread Num: 3, Iter: 1, Third, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 2, priority: 0, timeout (us): 0
original inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
override inputs:
inputs:
[0x0x7f382c080b08] input: images, type: UINT8, original shape: [2,384,256,3], batch + shape: [2,384,256,3], shape: [384,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.869615 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1048 step START
I0801 13:59:33.869639 1 grpc_server.cc:225] Ready for RPC 'ServerLive', 1049
I0801 13:59:33.869735 1 grpc_server.cc:270] Process for ServerLive, rpc_ok=1, 1048 step COMPLETE
I0801 13:59:33.869747 1 grpc_server.cc:411] Done for ServerLive, 1048
I0801 13:59:33.903047 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.903604 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d8500d0
I0801 13:59:33.903657 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.903671 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.903694 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d8500d0
I0801 13:59:33.904137 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.904160 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 1 requests
I0801 13:59:33.904158 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.904177 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.904194 1 tensorflow.cc:2430] model my_model, instance my_model, executing 1 requests
I0801 13:59:33.904204 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 1 requests
I0801 13:59:33.905482 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.934548 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [2,1,1,11043]
I0801 13:59:33.934619 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 88344, addr: 0x7f382de8ea70
I0801 13:59:33.934649 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.934662 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.934680 1 grpc_server.cc:2712] GRPC free: size 88344, addr 0x7f382de8ea70
I0801 13:59:33.934981 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.935003 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 1 requests
I0801 13:59:33.935019 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.935043 1 grpc_server.cc:2502] Done for ModelInferHandler, 0
I0801 13:59:33.942498 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step START
I0801 13:59:33.942533 1 grpc_server.cc:3585] New request handler for ModelInferHandler, 0
I0801 13:59:33.942598 1 infer_request.cc:710] prepared: [0x0x7f38240275a0] request id: Thread Num: 3, Iter: 1, Fourth, model: my_model, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 6, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3826b85748] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
override inputs:
inputs:
[0x0x7f3826b85748] input: images, type: UINT8, original shape: [6,256,256,3], batch + shape: [6,256,256,3], shape: [256,256,3]
original requested outputs:
softmax
requested outputs:
softmax
I0801 13:59:33.942687 1 tensorflow.cc:2430] model my_model, instance my_model, executing 1 requests
I0801 13:59:33.942707 1 tensorflow.cc:1598] TRITONBACKEND_ModelExecute: Running my_model with 1 requests
I0801 13:59:33.942931 1 tensorflow.cc:1850] TRITONBACKEND_ModelExecute: input 'images' is GPU tensor: false
I0801 13:59:33.981482 1 infer_response.cc:167] add response output: output: softmax, type: FP32, shape: [6,1,1,11043]
I0801 13:59:33.981596 1 grpc_server.cc:2592] GRPC: using buffer for 'softmax', size: 265032, addr: 0x7f382d3c6ce0
I0801 13:59:33.981660 1 tensorflow.cc:2124] TRITONBACKEND_ModelExecute: output 'softmax' is GPU tensor: false
I0801 13:59:33.981681 1 grpc_server.cc:3744] ModelInferHandler::InferResponseComplete, 0 step ISSUED
I0801 13:59:33.981712 1 grpc_server.cc:2712] GRPC free: size 265032, addr 0x7f382d3c6ce0
I0801 13:59:33.982115 1 grpc_server.cc:3310] ModelInferHandler::InferRequestComplete
I0801 13:59:33.982129 1 grpc_server.cc:3592] Process for ModelInferHandler, rpc_ok=1, 0 step COMPLETE
I0801 13:59:33.982147 1 tensorflow.cc:2182] TRITONBACKEND_ModelExecute: model my_model released 1 requests
These are the logs from my script:
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 2138112 for input 'images', expecting 2064384
In my run() function, I changed it to only 2 iterations, so there should have been a total of 32 inferences (4 threads, 2 iterations, 4 inferences per iteration). From the server logs, it looks like only 30 reached the server
Also, I've noticed that setting compression_algorithm to gzip
in the infer call seems to significantly reduce the chances of these errors being thrown
I'm currently doing some testing and it seems this issue started in Release 2.20.0 corresponding to NGC container 22.03
. I can't seem to recreate this in 22.02 so far.
I noticed that if we set the same value for preferred_batch_size
and max_batch_size
it works fine so I suspect it may have something to do with dynamic batcher.
Sorry for the late response, it does look like the dynamic batcher is not properly batching the requests, the server log suggest that it is batching request with different shape. Can you try one more thing that to set max_queue_delay_microseconds
to a really large value?
Can you also share the model or a synthetic one that we can reproduce the issue on our end?
I was able to recreate this with the first tensorflow saved model I found: https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet1k_b0/feature_vector/2
config.pbtxt I used:
name: "my_model"
platform: "tensorflow_savedmodel"
max_batch_size: 16
input {
name: "input_1"
data_type: TYPE_FP32
format: FORMAT_NHWC
dims: -1
dims: -1
dims: 3
}
output {
name: "output_1"
data_type: TYPE_FP32
dims: 1280
}
dynamic_batching {
preferred_batch_size: 4
max_queue_delay_microseconds: 500
}
Triton server start up command:
tritonserver --model-repository=/root --model-control-mode=explicit --strict-model-config=true --backend-config=tensorflow,version=2 --backend-config=python,python-runtime=/usr/bin/ --log-verbose=2 --log-info=true --log-warning=true --log-error=true
Triton container version:
nvcr.io/nvidia/tritonserver:22.07-py3
Updated testing script:
import logging
from concurrent import futures
from tritonclient.grpc import InferenceServerClient, InferInput, InferRequestedOutput
import numpy as np
logging.getLogger().setLevel(logging.INFO)
triton_client = InferenceServerClient('127.0.0.1:8001')
triton_client.unload_model('my_model')
triton_client.load_model('my_model')
def run():
for i in range(10):
print(i)
try:
inputs = []
input_array = np.random.randint(0, 255, (8, 256, 384, 3), dtype=np.uint8).astype('float32')
model_input = InferInput('input_1', input_array.shape, 'FP32')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('output_1')])
inputs = []
input_array = np.random.randint(0, 255, (6, 256, 336, 3), dtype=np.uint8).astype('float32')
model_input = InferInput('input_1', input_array.shape, 'FP32')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('output_1')])
inputs = []
input_array = np.random.randint(0, 255,(2, 384, 256, 3), dtype=np.uint8).astype('float32')
model_input = InferInput('input_1', input_array.shape, 'FP32')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('output_1')])
inputs = []
input_array = np.random.randint(0, 255,(6, 256, 256, 3), dtype=np.uint8).astype('float32')
model_input = InferInput('input_1', input_array.shape, 'FP32')
model_input.set_data_from_numpy(input_array)
res = triton_client.infer(model_name='my_model', inputs=[model_input], outputs=[InferRequestedOutput('output_1')])
except Exception as e:
logging.info(e)
with futures.ThreadPoolExecutor(4) as pool:
fs = []
for i in range(4):
fs.append(pool.submit(run))
for f in fs:
f.result()
One sample run output:
0
0
0
0
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
5
5
6
5
6
5
7
6
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 15630336 for input 'input_1', expecting 14450688
8
7
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 10911744 for input 'input_1', expecting 9437184
9
6
7
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 8552448 for input 'input_1', expecting 8257536
8
INFO:root:[StatusCode.INVALID_ARGUMENT] unexpected total byte size 10911744 for input 'input_1', expecting 9437184
9
8
7
9
8
9
It worked fine when running with only a single thread.
This issue should be fixed by https://github.com/triton-inference-server/core/pull/114
Thank you for the fix!
Will this fix be in the upcoming NGC container 22.08?
This fix didn't make it to 22.08, it will be available in 22.09 release
Closing this as the issue has been fixed.