TensorRT-LLM issues

when a model has layers with and without GPT plugin enabled, GptSession raises error

1

### System Info TensorRT-LLM: latest main branch built in the triton-trtllm container (23.12) GPU: V100 ### Who can help? @byshiue ### Information - [ ] The official example scripts -...

llan-ml

bug

triaged

Investigating

Generic Runtime

feat:[AutoDeploy] E2E build example for llama4 VLM

50

## Description Add an unit test as small build example for Llama4 MultiModal Model. Demonstrates 1. processing image and test inputs with `AutoProcessor.apply_chat_template()` 2. using `torch.cond` to accept both text+image...

Fridah-nv

AutoDeploy

How obtain the classification label of BERT model?

1

### System Info NVIDIA V100 nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks -...

zhangjiawei5911

bug

triaged

Attention Pattern Matching with Inductor Utilities

Apply approach in #4064 for attention pattern matching. This will greatly simplify our pattern matchers in this file

lucaslie

AutoDeploy

Move existing transformation into new configurable system

see title and parent issue

lucaslie

AutoDeploy

doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and document

3

# [TRTLLM-325]Integrate the NGC image in Makefile automation and document ## Description This PR adds automation for deploying images from NGC in [Makefile](../blob/main/docker/Makefile) and the corresponding documentation in [README.md](../blob/main/docker/README.md).

MartinMarciniszyn

[AutoDeploy] Dashboard Scale Up

Scaling up the AutoDeploy dashboard to better track model coverage

lucaslie

AutoDeploy

[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models

12

New tests added: - Llama-3.2-1B: added mmlu benchmark - Llama-3.1-Nemotron-Nano-8B-v1: added GSM8K, GPQADiamond benchmarks - Llama-3_1-Nemotron-Ultra-253B-v1: added the entire model (FP8 variant is being added to `ftp/llm-models`) - Phi-4-mini-instruct: added...

moraxu

Feat: add chunked-attention kernels on Blackwell

3

# Feat: add chunked-attention kernels on Blackwell Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \ For example, assume I have a PR hope...

PerkzZheng

Check test names in waive list

22

# PR title Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \ For example, assume I have a PR hope to support a new...

EmmaQiaoCh

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

when a model has layers with and without GPT plugin enabled, GptSession raises error

feat:[AutoDeploy] E2E build example for llama4 VLM

How obtain the classification label of BERT model?

Attention Pattern Matching with Inductor Utilities

Move existing transformation into new configurable system

doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and document

[AutoDeploy] Dashboard Scale Up

[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models

Feat: add chunked-attention kernels on Blackwell

Check test names in waive list

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard