All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.
Description All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.
Triton Information 23.10
Are you using the Triton container or did you build it yourself? container from NGC
To Reproduce When using the TensorRT backend, I often encounter a large number of connection timeouts with gRPC, while HTTP requests work fine. This indicates that there is no problem with the model, but rather with the RPC. After restarting the service, RPC requests return to normal.
Expected behavior
After tcp packet capture analysis, grpc port 8001 is normal, it is confirmed that the request reaches tritonserver, and finally times out.
@tanmayv25 @Tabrizian @CoderHam Sincerely asking for help!
I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?
I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?
After set log_verbose_level=2, I found more information. It seems request cannot be fetched from cq.
hope this find could help investigate.
Thank you for reporting this, I filed a ticket for our team to investigate: 6211
If the service state is mistakenly judged as shutdown and there are no new requests in the completion queue (cq), will it block all RPC requests?
I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?
I am using http client in the same environment you described. But I'm facing occasional regular timeouts, it seems that the client can't connect to the server, but after one occurrence the rest of the requests are fine until the next time the problem occurs. Will you face the problem too ?
I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?
I am using http client in the same environment you described. But I'm facing occasional regular timeouts, it seems that the client can't connect to the server, but after one occurrence the rest of the requests are fine until the next time the problem occurs. Will you face the problem too ?
No, I haven't encountered this problem
If the service state is mistakenly judged as shutdown and there are no new requests in the completion queue (cq), will it block all RPC requests?
Thank you for reporting this, I filed a ticket for our team to investigate: 6211
Hi, Andreeva. Is there any progress?
The issue is being looked at.
The issue is being looked at.
Thanks, look forward to your soonest reply.