server All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.

Description All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.

Triton Information 23.10

Are you using the Triton container or did you build it yourself? container from NGC

To Reproduce When using the TensorRT backend, I often encounter a large number of connection timeouts with gRPC, while HTTP requests work fine. This indicates that there is no problem with the model, but rather with the RPC. After restarting the service, RPC requests return to normal.

Expected behavior

Feb 21 '24 04:02 SunnyGhj

After tcp packet capture analysis, grpc port 8001 is normal, it is confirmed that the request reaches tritonserver, and finally times out.

Feb 22 '24 07:02 SunnyGhj

@tanmayv25 @Tabrizian @CoderHam Sincerely asking for help！

Feb 22 '24 07:02 SunnyGhj

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

Feb 22 '24 11:02 biaochen

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

After set log_verbose_level=2, I found more information. It seems request cannot be fetched from cq. hope this find could help investigate.

Feb 23 '24 06:02 biaochen

Thank you for reporting this, I filed a ticket for our team to investigate: 6211

Feb 24 '24 01:02 oandreeva-nv

If the service state is mistakenly judged as shutdown and there are no new requests in the completion queue (cq), will it block all RPC requests? 企业微信截图_f63db341-811a-4473-9089-210379ef4227

Feb 26 '24 07:02 SunnyGhj

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

I am using http client in the same environment you described. But I'm facing occasional regular timeouts, it seems that the client can't connect to the server, but after one occurrence the rest of the requests are fine until the next time the problem occurs. Will you face the problem too ?

Feb 28 '24 06:02 secain

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything works fine. But somehow the grpc infer is blocking, statistics show that no request is performed. But if I switch to http client, inference is ok. It seems grpc infer is blocking, maybe the request has not been passed to core engine?

I am using http client in the same environment you described. But I'm facing occasional regular timeouts, it seems that the client can't connect to the server, but after one occurrence the rest of the requests are fine until the next time the problem occurs. Will you face the problem too ?

No, I haven't encountered this problem

Feb 28 '24 11:02 SunnyGhj

If the service state is mistakenly judged as shutdown and there are no new requests in the completion queue (cq), will it block all RPC requests?

sssss

Feb 29 '24 09:02 SunnyGhj

Thank you for reporting this, I filed a ticket for our team to investigate: 6211

Hi, Andreeva. Is there any progress?

Mar 04 '24 05:03 SunnyGhj

The issue is being looked at.

Mar 04 '24 17:03 oandreeva-nv

The issue is being looked at.

Thanks, look forward to your soonest reply.

Mar 05 '24 10:03 SunnyGhj