Kris Hung
Kris Hung
Hi @yoo-wonjun, regarding > When the problem request is completed, I check nvidia-smi and the memory is 5680, so I repeat the request and the memory is 5680. I was...
@yoo-wonjun Thanks for the explanation. > If it depends on the framework you mentioned, does this mean that it may be a problem that occurs when using tensorrt? I mean...
Closing due to lack of activity. Please re-open the issue if you would like to follow up with this issue.
@tanmayv25 for vis.
Hi @MatthieuToulemont, the Triton TRT-LLM container is a special container that only contains TRT-LLM backend and Python backend. If you'd like to have other backends, you could try with either...
No, the Python Backend should be the same.
@tricky61 The `nvcr.io/nvidia/tritonserver:24.05-py3` container contains ONNX, TRT and PyTorch backends. The `nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3` only has TRTLLM and Python backends.
@tricky61 It shouldn't make any difference. Note that you'd have to `pip install vllm` and make sure `model.py` exists under `/opt/tritonserver/backends/vllm_backend`.
@kaiyux Could you advise what would be the approach for external contribution here?
Thanks @kaiyux! I can help with integrating to the internal repo once the changes are finalized. What steps need to be taken to properly credit the contributor?