TensorRT-LLM issues

feat:[AutoDeploy] utilize torch._inductor.pattern_matcher to write pattern matcher

3

## Description 1. An example of using torch._inductor.pattern_matcher to match RoPE with explicit cos/sin pattern 2. Wrap up in utility file ## Test Coverage ## GitHub Bot Help `/bot [-h]...

Fridah-nv

AutoDeploy

[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant

16

## Description Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant. CLI accuracy tests are still needed because NIMs use TRT-LLM's TRT backend for now. ## Test Coverage...

moraxu

LLM.generate() got an unexpected keyword argument 'kv_cache_retention_config'

1

Hello! I have the latest version of TensorRT-LLM installed via pip. According to your documentation, the feature I’m trying to use should be available, but it doesn’t seem to work....

volodymyrhotsiy

No response to requests when using Multi-node --pp_size > 1 in a multi-node environment

### System Info - GPU Properties NVIDIA H100 - Libraries v0.20.0rc1 - Docker ### Who can help? _No response_ ### Information - [x] The official example scripts - [x] My...

Archmilio

bug

[Call for contributions]The development plan of large-scale EP support in TensorRT-LLM

1

Big thanks to the DeepSeek team for their awesome works! Recently, large-scale fine-grained MoE models have been gaining popularity, but they also bring new optimization challenges (and opportunities) for LLM...

juney-nvidia

Community Engagement

[bug] Lookahead spec-dec verifies w guesses instead of g

1

### System Info All systems ### Who can help? @kaiyux ### Information - [x] The official example scripts - [x] My own modified scripts ### Tasks - [x] An officially...

mahmoudhas

bug

Context node crash when using PD Disaggregation

1

### System Info NVIDIA H20 TensorRT-LLM version: 0.19.0.dev2025041500 ### Who can help? _No response_ ### Information - [x] The official example scripts - [ ] My own modified scripts ###...

nsealati

bug

test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt)

# Expand PyT `llama_v3.1_nemotron_nano_8b` perf tests coverage ## Description This PR adds end-to-end performance results for the **llama_v3.1_nemotron_nano_8b** bfloat16 engine on 1 H100. Two broad load patterns were evaluated on...

venkywonka

user trtllm-serve error

1

### System Info RuntimeError: Failed to import transformers.models.bert.modeling_bert because of the following error (look up to see its traceback): /usr/local/lib/python3.12/dist-packages/flash_attn_2_cuda.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ### Who can help? _No response_ ###...

SafeCool

bug

need more info

fix: [nvbugs/5066257] serialization improvments

103

## Description Add an approve list into the pickle deserialization process to reduce the attack surface of using pickle to a subset of supported objects. ## Test Coverage if a...

coldwaterq

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

feat:[AutoDeploy] utilize torch._inductor.pattern_matcher to write pattern matcher

[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant

LLM.generate() got an unexpected keyword argument 'kv_cache_retention_config'

No response to requests when using Multi-node --pp_size > 1 in a multi-node environment

[Call for contributions]The development plan of large-scale EP support in TensorRT-LLM

[bug] Lookahead spec-dec verifies w guesses instead of g

Context node crash when using PD Disaggregation

test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt)

user trtllm-serve error

fix: [nvbugs/5066257] serialization improvments

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard