tensorrtllm_backend issues

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

2

### System Info - CPU Architecture x86_64 - GPU - A100-80GB - CUDA version - 11 - Tensorrt LLM version : 0.9.0 - Triton server version - 2.46.0 - model...

ajagetia2001

bug

Is `no_repeat_ngram_size` generation option supported?

### System Info N/A ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X] An officially...

ghost

bug

Add missing kv_cache related metrics

Pernekhan

Add Phi 3 vision multimodal support

Phi 3 vision support exists in [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/multimodal#phi-3-vision), but not in tensorrtllm_backend. Could support be added to tensorrtllm_backend now that we have some multimodal support for Blip2 and Llava? Thanks for...

iibw

bug

request was blocked when gpt_model_type=inflight_fused_batching, serving baichuan model

4

Hello, I am currently experiencing an issue with the `triton-inference-server/tensorrt_backend` while trying to run a Baichuan model. ### Description I have set `gpt_model_type=inflight_fused_batching` in my model configuration, but when I...

burling

triaged

[Bugfix]fix the thread lock when user input same id

deploy with the IFB, when user input the paylod as follow: ``` { "text_input": str(question), "max_tokens": 512, "bad_words": "", "stop_words": stop_words, "pad_id": pad_id, "end_id": end_id, "top_p": 1, "id": "ggbond_test", "temperature":...

GGBond8488

The Docker container stops when using `python3 scripts/launch_triton_server.py --world_size 1 --model_repo=model_repo/` as the starting command in the Docker Compose YAML file.

1

### System Info - CPU EPYC 7H12 (32 core) - GPU NVIDIA A100-SXM4-80GB ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ]...

Aquasar11

bug

Fix the exiting bug in docker compose when using the scripts/launch_t…

1

This PR will fix this [issue](https://github.com/triton-inference-server/tensorrtllm_backend/issues/580)

Aquasar11

Failed to build TensorRT-LLM backend for Triton server.

1

Greetings, I have come across following issue when trying to build TensorRT-LLM backend for Triton server: **/home/nvidia/projects/triton-inference-server/tensorrtllm_backend/inflight_batcher_llm/../tensorrt_llm/cpp/include/tensorrt_llm/common/dataType.h:40:30: error: ‘kFP4’ is not a member of ‘nvinfer1::DataType’; did you mean ‘kFP8’?** I...

sdecoder

lora_config shape mismatch when using converted LoRA at runtime

### System Info ## System Info - CPU architecture: x86_64 - Host Memory: 1TB - GPU: NVIDIA A100 80GB x8 - TensorRT-LLM version: v0.18.2 - Triton container: `nvcr.io/nvidia/tritonserver:25.04-trtllm-python-py3` - CUDA:...

paulhendricks

bug

tensorrtllm_backend
tensorrtllm_backend copied to clipboard

Metadata

Metrics "nv_inference_request_failure" value is always 0 even after getting 5xx at the client side

Is `no_repeat_ngram_size` generation option supported?

Add missing kv_cache related metrics

Add Phi 3 vision multimodal support

request was blocked when gpt_model_type=inflight_fused_batching, serving baichuan model

[Bugfix]fix the thread lock when user input same id

The Docker container stops when using `python3 scripts/launch_triton_server.py --world_size 1 --model_repo=model_repo/` as the starting command in the Docker Compose YAML file.

Fix the exiting bug in docker compose when using the scripts/launch_t…

Failed to build TensorRT-LLM backend for Triton server.

lora_config shape mismatch when using converted LoRA at runtime

← Metadata

Owner

Metadata

tensorrtllm_backend tensorrtllm_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

tensorrtllm_backend
tensorrtllm_backend copied to clipboard