TensorRT-LLM issues

feat:enable kvcache to be reused during request generation

23

# Feat:enable kvcache to be reused during request generation Issue: [https://github.com/NVIDIA/TensorRT-LLM/issues/3733] [issues/3733][feat] enable kvcache to be reused during request generation ## Description This PR enhances the KV cache reuse logic...

narutolhy

Community want to contribute

Community Engagement

fix: llmapi-launch add add trtllm-bench test with engine building

43

# PR title Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \ For example, assume I have a PR hope to support a new...

Superjomn

[TRTLLM-5000][feat] Pytorch implementation of ngram drafter

6

## Description End-To-End support of Ngram drafter in pyTorch workflow (old name Prompt-Lookup-Decoding (PLD) in TRT workflow). Usage example: ```bash python examples/pytorch/quickstart_advanced.py \ --spec_decode_algo NGRAM \ --spec_decode_nextn 4 \ --max_matching_ngram_size...

thorjohnsen

feat: NIXL interface integration

42

This PR, in conjunction with [PR 3769](https://github.com/NVIDIA/TensorRT-LLM/pull/3769) , provides an interface solution for dynamically linking NIXL.

Shixiaowei02

feat: Low Precision Allreduce for PCIe based GPU

21

last PR：https://github.com/NVIDIA/TensorRT-LLM/pull/3851 last revet PR：https://github.com/NVIDIA/TensorRT-LLM/pull/4340

kanghui0204

Remove vila test

8

# Remove vila test from backend tests Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \ For example, assume I have a PR hope...

Tabrizian

Add llama4 disagg accuracy tests

14

# Add llama4 disagg accuracy test `[05/14/2025-17:25:09] [TRT-LLM] [I] MMLU weighted average accuracy: 80.38 (4104)` Please write the PR title by following template: [JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] \...

Tabrizian

Scaffoldingllm supports MCP

# Support MCP in TensorRT-LLM Scaffolding support MCP #3335 https://github.com/NVIDIA/TensorRT-LLM/issues/3335 ## Description MCP provides a standard tool-use ability to TensorRT-LLM, to utilize LLM function-call. ## examples run a mcp server...

wu1du2

Community want to contribute

Community Engagement

[Fix][Deepseek] Fix bugs in TestDeepSeekR1

## Description There were two bugs in the tests: - moe_backend was not passed to `PyTorchConfig` - `batch_size` should have been `max_batch_size` After fixing these, I reran the tests and...

hlu1

[TRTLLM-5273]feat/Use full attention mask if Llama3 is used as encoder and fix EarlyStopDecoder unsqueeze bug

13

# [TRTLLM-5273]feat/Use full attention mask if Llama3 is used as encoder and fix EarlyStopDecoder unsqueeze bug ## Description This PR adds in a flag for bidirectional_attention to `modeling_llama.py`. This is...

nvrohanv

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

feat:enable kvcache to be reused during request generation

fix: llmapi-launch add add trtllm-bench test with engine building

[TRTLLM-5000][feat] Pytorch implementation of ngram drafter

feat: NIXL interface integration

feat: Low Precision Allreduce for PCIe based GPU

Remove vila test

Add llama4 disagg accuracy tests

Scaffoldingllm supports MCP

[Fix][Deepseek] Fix bugs in TestDeepSeekR1

[TRTLLM-5273]feat/Use full attention mask if Llama3 is used as encoder and fix EarlyStopDecoder unsqueeze bug

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard