server
server copied to clipboard
How ot imporve throughput on tritonserver
Description:
This is the result of perf_analyzer when I start one tritonserver:
After that, I turned on the SO_REUSEPORT switch of grpc and started 8 tritonservers.
This is the result of perf_analyzer:
I would like to know whether the results of this experiment indicate that the bottleneck is on grpc. Then I tried to increase these parameters to improve the throughput of a single tritonserver, but the results were very low. such as: #define REGISTER_GRPC_INFER_THREAD_COUNT 2 -> 64
or Increase the number of cq_queue
I can also provide perf of tritonserver:
I know that throughput can be improved by properly configuring batch_size and number of concurrent threads using perf profiler or modifying config.pbtxt like enabling dynamic batching, but what I want to confirm is if grpc is the bottleneck.May I ask how to improve the throughput of tritonserver with batch_size = 1.
Hi @bilibiliGO283, if you want to confirm if GRPC is the bottleneck, maybe you can run the same experiment using perf_analyzer through CAPI so that you can compare the outcomes and see the overhead that GRPC introduces. @Tabrizian Are you able to provide more context for this?
Thank you very much for your reply。@krishung5
here is perf_analyzer through C API ouput:
concurrency : 1
concurrency : 128
concurrency : 256
It seems that there is a little problem in the calculation of the queue. The throughput at this time reaches the upper limit of a single tritonserver, and it seems that grpc is not the bottleneck.What would the bottleneck be?
Hi @bilibiliGO283, the referenced bug was fixed in this PR: https://github.com/triton-inference-server/client/pull/124
The patched client will be included in the 22.07 release coming out soon. To get it sooner, you could also build the client off r22.07 or main.
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this.