tensorrtllm_backend
tensorrtllm_backend copied to clipboard
The Triton TensorRT-LLM Backend
Hello, so our current models stack consists of a set of models built in TensorRT and the whisper ASR model. I'd like to use triton server to host all these...
Deprecate old multinode tutorial link that is no longer relevant
Try to get a valid output of the model, i.e. do not include the input of the model in the case of non-streaming
### System Info tensorrtllm backend doesn't work for us because of this bug: https://github.com/triton-inference-server/tensorrtllm_backend/issues/598. So I have to use [python backend](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py). However, it only supports detached model which we don't...
**Description** I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. **Triton Information** What version of Triton are you...
### System Info cpu intel 14700k gpu rtx 4090 tensorrt_llm 0.13 docker tritonserver:24.09-trtllm-python-py3 ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My...
### System Info Ubuntu 22.04 Triton image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 and the version of trtllm-backend is 0.10.0 Model: qwen2-7b-instruct ### Who can help? _No response_ ### Information - [ ] The official...
typo correction in launch_triton_server.py
### System Info NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.5 base docker image: tritonserver:24.08-trtllm-python-py3 ### Who can help? I want install torchaudio in ` tritonserver:24.08-trtllm-python-py3`. A conflict occurred between...
I have 2 3090. I want to launch tensorrt on every gpu separately. For this i have to change several things: 1- in tensorrt_llm/config.pbtxt: ``` instance_group { count: 0 kind:...