tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

The Triton TensorRT-LLM Backend

Results 251 tensorrtllm_backend issues
Sort by recently updated
recently updated
newest added

Hello, so our current models stack consists of a set of models built in TensorRT and the whisper ASR model. I'd like to use triton server to host all these...

triaged

Deprecate old multinode tutorial link that is no longer relevant

Try to get a valid output of the model, i.e. do not include the input of the model in the case of non-streaming

### System Info tensorrtllm backend doesn't work for us because of this bug: https://github.com/triton-inference-server/tensorrtllm_backend/issues/598. So I have to use [python backend](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py). However, it only supports detached model which we don't...

bug

**Description** I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. **Triton Information** What version of Triton are you...

### System Info cpu intel 14700k gpu rtx 4090 tensorrt_llm 0.13 docker tritonserver:24.09-trtllm-python-py3 ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My...

bug

### System Info Ubuntu 22.04 Triton image: nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 and the version of trtllm-backend is 0.10.0 Model: qwen2-7b-instruct ### Who can help? _No response_ ### Information - [ ] The official...

bug

typo correction in launch_triton_server.py

### System Info NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.5 base docker image: tritonserver:24.08-trtllm-python-py3 ### Who can help? I want install torchaudio in ` tritonserver:24.08-trtllm-python-py3`. A conflict occurred between...

bug

I have 2 3090. I want to launch tensorrt on every gpu separately. For this i have to change several things: 1- in tensorrt_llm/config.pbtxt: ``` instance_group { count: 0 kind:...