serving
serving copied to clipboard
tensorflow serving batch inference slow !!!!
Excuse me, how to solve the problem of slow speed? shape:(1, 32, 387, 1) data time: 0.005219221115112305 post time: 0.24771547317504883 end time: 0.2498164176940918 shape:(2, 32, 387, 1) data time: 0.0056378841400146484 post time: 0.4651315212249756 end time: 0.4693586826324463
docker run --runtime=nvidia -it --rm -p 8501:8501
-v "$(pwd)/densenet_ctc:/models/docker_test"
-e MODEL_NAME=docker_test tensorflow/serving:latest-gpu
--tensorflow_intra_op_parallelism=8
--tensorflow_inter_op_parallelism=8
--enable_batching=true
--batching_parameters_file=/models/docker_test/batching_parameters.conf
num_batch_threads { value: 4 } batch_timeout_micros { value: 2000} max_batch_size {value: 48} max_enqueued_batches {value: 48}
GPU:1080Ti Thanks.
@sevenold, Can you please let us know what is the GPU Utilization during Serving. Problem might be low GPU Utilization.
Can you please try running the Container with the below parameters and let us know if it resolves your issue. Thanks!
--grpc_channel_arguments=grpc.max_concurrent_streams=1000
--per_process_gpu_memory_fraction=0.7
--enable_batching=true
--max_batch_size=10
--batch_timeout_micros=1000
--max_enqueued_batches=1000
--num_batch_threads=6
--batching_parameters_file=/models/flow2_batching.config
--tensorflow_session_parallelism=2 \
For more information, please refer #1440
@rmothukuru I try running the Container with the below parameters but the same result.
docker run --runtime=nvidia -it --rm -p 8501:8501
-v "$(pwd)/densenet_ctc:/models/docker_test"
-e MODEL_NAME=docker_test tensorflow/serving:latest-gpu
--grpc_channel_arguments=grpc.max_concurrent_streams=1000
--per_process_gpu_memory_fraction=0.7
--enable_batching=true
--max_batch_size=128
--batch_timeout_micros=1000
--max_enqueued_batches=1000
--num_batch_threads=8
--batching_parameters_file=/models/docker_test/batching_parameters.conf
--tensorflow_session_parallelism=2
it's also low GPU Utilization.
@sevenold, Can you please confirm that you have gone through the issue, #1440 and issue still persists. If so, can you please share your Model so that we can reproduce the issue at our side. Thanks!
@rmothukuru Thanks. google drive This is my model and client.
@rmothukuru I tested my other models, such as the verification code recognition model, and the parameters are the same, it is normal to use gpu for prediction.Thanks!
maybe you can try the grpc channel
maybe you can try the grpc channel
I tried but the same result.
Same question . Seems like tf serving predicts images tandem even I post multiple images one time.
what happens when you load up the model with TF? Do you get significantly better inference latency? your TF runtime requires X time to do a forward pass on your model on a batch of examples, X becomes a lower bound for your inference latency with TF Serving.
I found that the serialization(of FP16 data) is of great overhead in the gRPC client API. And this heavily drops the QPS. And in my case, I use 3x224x244 as the data to be transferred. The serialization cost is 2 times as the server processing time in the ResNet50 model.
Is this issue solved? I'm having the same problem when serving a OpenNMT tensorflow model. I have configured the --rest_api_num_threads=1000 and --grpc_channel_arguments=grpc.max_concurrent_streams=1000 they just won't work somehow, the tensorflow server keeps saying gRPC resource exhausted, I can't send more than 15 requests in concurrent threads.
@oohx,
Could you please provide some more information for us to debug this issue? We would like to understand how the same model with same batching data performs in Tensorflow. Could you please share the latency of your model doing inference in TF runtime and same model doing inference in TF serving.
If your TF runtime requires X time to do a forward pass on your model on a batch of examples, X becomes a lower bound for your inference latency with TF Serving. Also, please refer to performance guide.
Thank you!
This issue was closed due to lack of activity after being marked stale for past 14 days.