tensorrtllm_backend
tensorrtllm_backend copied to clipboard
The Triton TensorRT-LLM Backend
Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side
### System Info - CPU Architecture x86_64 - GPU - A100-80GB - CUDA version - 11 - Tensorrt LLM version : 0.9.0 - Triton server version - 2.46.0 - model...
### System Info N/A ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially...
Phi 3 vision support exists in [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal#phi-3-vision), but not in tensorrtllm_backend. Could support be added to tensorrtllm_backend now that we have some multimodal support for Blip2 and Llava? Thanks for...
Hello, I am currently experiencing an issue with the `triton-inference-server/tensorrt_backend` while trying to run a Baichuan model. ### Description I have set `gpt_model_type=inflight_fused_batching` in my model configuration, but when I...
deploy with the IFB, when user input the paylod as follow: ``` { "text_input": str(question), "max_tokens": 512, "bad_words": "", "stop_words": stop_words, "pad_id": pad_id, "end_id": end_id, "top_p": 1, "id": "ggbond_test", "temperature":...
### System Info - CPU EPYC 7H12 (32 core) - GPU NVIDIA A100-SXM4-80GB ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ]...
This PR will fix this [issue](https://github.com/triton-inference-server/tensorrtllm_backend/issues/580)
Greetings, I have come across following issue when trying to build TensorRT-LLM backend for Triton server: **/home/nvidia/projects/triton-inference-server/tensorrtllm_backend/inflight_batcher_llm/../tensorrt_llm/cpp/include/tensorrt_llm/common/dataType.h:40:30: error: ‘kFP4’ is not a member of ‘nvinfer1::DataType’; did you mean ‘kFP8’?** I...
### System Info ## System Info - CPU architecture: x86_64 - Host Memory: 1TB - GPU: NVIDIA A100 80GB x8 - TensorRT-LLM version: v0.18.2 - Triton container: `nvcr.io/nvidia/tritonserver:25.04-trtllm-python-py3` - CUDA:...