biaochen

Results 6 comments of biaochen

I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything...

> I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started,...

guess maybe configuration issue? should I upload configuration files;

I've successfully installed and tested with v0.10.0, close this issue

The above issue is solved, I need to set num_draft_tokens in sps request. I've seen slight performance gain in the above setting (llama3 8b vs 70b, 4xA100 80G). However, I...

Maybe this is a simple configuration matter. I could provide my all config files if needed. Thx~