TensorRT-LLM
TensorRT-LLM copied to clipboard
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...
### System Info TensorRT-LLM: latest main branch built in the triton-trtllm container (23.12) GPU: V100 ### Who can help? @byshiue ### Information - [ ] The official example scripts -...
## Description Add an unit test as small build example for Llama4 MultiModal Model. Demonstrates 1. processing image and test inputs with `AutoProcessor.apply_chat_template()` 2. using `torch.cond` to accept both text+image...
### System Info NVIDIA V100 nvcr.io/nvidia/tritonserver:23.10-trtllm-python-py3 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks -...
Apply approach in #4064 for attention pattern matching. This will greatly simplify our pattern matchers in this file
# [TRTLLM-325]Integrate the NGC image in Makefile automation and document ## Description This PR adds automation for deploying images from NGC in [Makefile](../blob/main/docker/Makefile) and the corresponding documentation in [README.md](../blob/main/docker/README.md).
Scaling up the AutoDeploy dashboard to better track model coverage
New tests added: - Llama-3.2-1B: added mmlu benchmark - Llama-3.1-Nemotron-Nano-8B-v1: added GSM8K, GPQADiamond benchmarks - Llama-3_1-Nemotron-Ultra-253B-v1: added the entire model (FP8 variant is being added to `ftp/llm-models`) - Phi-4-mini-instruct: added...
# Feat: add chunked-attention kernels on Blackwell Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \ For example, assume I have a PR hope...
# PR title Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \ For example, assume I have a PR hope to support a new...