TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...

Results 937 TensorRT-LLM issues
Sort by recently updated
recently updated
newest added

…rch_streaming to cover multi-beam streaming cases required by the NIM team. ## Summary by CodeRabbit * **Tests** * Added comprehensive test coverage for beam search functionality with streaming support in...

## Summary by CodeRabbit ## Release Notes * **Documentation** * Updated draft model naming labels and references in documentation for consistency. ## Description ## Test Coverage ## PR Checklist Please...

Community want to contribute

Fixes #9154 Fixes #8948 ## Summary by CodeRabbit * **New Features** * Added manual tensor parallelism sharding configuration option for auto-deployment workflows. Users now have granular control over how individual...

AutoDeploy

## Summary by CodeRabbit * **Improvements** * Enhanced timeout messaging for KV cache transfer operations. * **Tests** * Updated KV cache transfer backend configuration in test cases. * Re-enabled previously...

Based on this PR https://github.com/NVIDIA/TensorRT-LLM/pull/9376 from @chang-l with minor changes to support KVCache reuse. @coderabbitai summary ## Description ## Test Coverage ## PR Checklist Please review the following before submitting...

## Summary by CodeRabbit ## Release Notes * **New Features** * Added CI-specific image tagging functionality for improved version management in continuous integration builds. * Introduced dedicated CI build pipeline...

### 🚀 The feature, motivation and pitch Now that PT LlmArgs have mostly stabilized, let's see if we can more closely align AD LlmArgs with PT LlmArgs: 1. Deprecate `AutoDeployConfig`...

feature request
AutoDeploy