TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...

Results 937 TensorRT-LLM issues
Sort by recently updated
recently updated
newest added

## Description 1. An example of using torch._inductor.pattern_matcher to match RoPE with explicit cos/sin pattern 2. Wrap up in utility file ## Test Coverage ## GitHub Bot Help `/bot [-h]...

AutoDeploy

## Description Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant. CLI accuracy tests are still needed because NIMs use TRT-LLM's TRT backend for now. ## Test Coverage...

Hello! I have the latest version of TensorRT-LLM installed via pip. According to your documentation, the feature I’m trying to use should be available, but it doesn’t seem to work....

### System Info - GPU Properties NVIDIA H100 - Libraries v0.20.0rc1 - Docker ### Who can help? _No response_ ### Information - [x] The official example scripts - [x] My...

bug

Big thanks to the DeepSeek team for their awesome works! Recently, large-scale fine-grained MoE models have been gaining popularity, but they also bring new optimization challenges (and opportunities) for LLM...

Community Engagement

### System Info All systems ### Who can help? @kaiyux ### Information - [x] The official example scripts - [x] My own modified scripts ### Tasks - [x] An officially...

bug

### System Info NVIDIA H20 TensorRT-LLM version: 0.19.0.dev2025041500 ### Who can help? _No response_ ### Information - [x] The official example scripts - [ ] My own modified scripts ###...

bug

# Expand PyT `llama_v3.1_nemotron_nano_8b` perf tests coverage ## Description This PR adds end-to-end performance results for the **llama_v3.1_nemotron_nano_8b** bfloat16 engine on 1 H100. Two broad load patterns were evaluated on...

### System Info RuntimeError: Failed to import transformers.models.bert.modeling_bert because of the following error (look up to see its traceback): /usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ### Who can help? _No response_ ###...

bug
need more info

## Description Add an approve list into the pickle deserialization process to reduce the attack surface of using pickle to a subset of supported objects. ## Test Coverage if a...