TensorRT-LLM
TensorRT-LLM copied to clipboard
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...
### 🚀 The feature, motivation and pitch Scoping out what it takes to support speculative decoding with overlap scheduling in AutoDeploy. ### Alternatives _No response_ ### Additional context The overlap...
## Summary by CodeRabbit - Refactor - Simplified KV cache APIs by removing the onboard_blocks option; onboarding/offloading now handled automatically. - Updated C++ and Python constructor signatures (and property bindings)...
## Description This change adds two tests to help ensure that thirdparty C++ code is integrated according to the process descripted in 3rdparty/cpp-thirdparty.md. The desired process requires folks to use...
### System Info NVIDIA A100 80GB PCIe Driver Version: 570.172.08 CUDA Version: 12.8 Ubuntu 22.04 jammy Container launched with following docker compose file: ``` services: tensorrt-llm: image: nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc2 container_name: tensorrt-llm-container...
@coderabbitai summary ## Description ## Test Coverage ## PR Checklist Please review the following before submitting your PR: - PR description clearly explains what and why. If using CodeRabbit's summary,...
### 🚀 The feature, motivation and pitch Test 2-model (and later 1-model) spec dec with TP > 1. Maybe this test here can be extended: https://github.com/NVIDIA/TensorRT-LLM/pull/9275/files#r2557977057 ### Alternatives _No response_...
### System Info - H100 ### Who can help? _No response_ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks -...
## Description Avoid recomputing softmax after top-p prob masking. ## PR Checklist Please review the following before submitting your PR: - PR description clearly explains what and why. If using...
@coderabbitai summary ## Description Add LoRA adapter and perf test for the pytorch backend implementation of starcoder2 ## Test Coverage ## PR Checklist Please review the following before submitting your...
## Summary by CodeRabbit * **Bug Fixes** * Enhanced reliability of streaming generation responses with automatic retry logic for transient connection failures. * Improved error messages to include token count...