biaochen
biaochen
I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started, everything...
> I meet silimar issue. I'm deploying tritonserver on T4 in docker container, and infer via http/grpc endpoint. The models are tensorrt engines converted from onnx. After server is started,...
guess maybe configuration issue? should I upload configuration files;
I've successfully installed and tested with v0.10.0, close this issue
The above issue is solved, I need to set num_draft_tokens in sps request. I've seen slight performance gain in the above setting (llama3 8b vs 70b, 4xA100 80G). However, I...
Maybe this is a simple configuration matter. I could provide my all config files if needed. Thx~