biaochen comments

Results 6 comments of


                                            biaochen

All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything...

All gRPC requests to the Triton server are timing out, but HTTP requests are functioning normally.

> I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started,...

run demo generation failed

guess maybe configuration issue? should I upload configuration files;

run demo generation failed

I've successfully installed and tested with v0.10.0, close this issue

speculative decoding performance

The above issue is solved, I need to set num_draft_tokens in sps request. I've seen slight performance gain in the above setting (llama3 8b vs 70b, 4xA100 80G). However, I...

speculative decoding performance

Maybe this is a simple configuration matter. I could provide my all config files if needed. Thx~