[None][chore] Remove onboard block switch for KV cache manager
Summary by CodeRabbit
-
Refactor
- Simplified KV cache APIs by removing the onboard_blocks option; onboarding/offloading now handled automatically.
- Updated C++ and Python constructor signatures (and property bindings) to exclude onboard_blocks; parameter order adjusted accordingly.
- Removed onboard_blocks from serialization/pickling formats; saved state no longer includes this field.
-
Tests
- Updated unit tests to align with the streamlined APIs and serialization changes.
Description
This MR has no functional change intended.
Dead code elimination. The secondary block pool is derived when kv_cache_config::host_cache_size is specified. Whether we onboard/offload a kv cache block can be implicated from whether the manager has secondary block or not. The onboardBlocks toggle itself only adds complication. This commit removes it.
Test Coverage
Since no functional change is intended. No test change is needed.
PR Checklist
Please review the following before submitting your PR:
-
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
-
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
-
Test cases are provided for new code paths (see test instructions)
-
Any new dependencies have been scanned for license and vulnerabilities
-
CODEOWNERS updated if ownership changes
-
Documentation updated as needed
-
The reviewers assigned automatically/manually are appropriate for the PR.
-
[x] Please check this after reviewing the above items as appropriate for this PR.
📝 Walkthrough
Walkthrough
Removes the onboardBlocks parameter and related logic from KV cache components across headers, implementations, bindings, serialization, and tests. Constructor signatures and parameter ordering are updated accordingly. Offload/onboard gating tied to onboardBlocks is eliminated. Python bindings and serialization schemas drop the onboard_blocks field. Tests adjusted to new APIs.
Changes
| Cohort / File(s) | Summary |
|---|---|
KV cache header API updatescpp/include/tensorrt_llm/batch_manager/kvCacheManager.h |
Removed onboardBlocks from WindowBlockManager, BlockManager, KVCacheManager constructors; reordered parameters; removed member storing onboard policy. |
Executor config headercpp/include/tensorrt_llm/executor/executor.h |
KvCacheConfig: removed onboardBlocks ctor param, getter/setter, and private member. |
KV cache implementationcpp/tensorrt_llm/batch_manager/kvCacheManager.cpp |
Purged onboarding gating; updated constructors and internal calls; adjusted logging; offload/onboard decisions no longer depend on onboard flag. |
Inflight batching usagecpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp |
Removed runtime guard requiring onboard blocks for certain FMHA; updated KVCacheManager construction to new signature. |
Executor config implcpp/tensorrt_llm/executor/kvCacheConfig.cpp |
KvCacheConfig ctor drops onboardBlocks; reorders params; removes getter/setter implementations and member init. |
Serialization changescpp/tensorrt_llm/executor/serialization.cpp |
Removed onboardBlocks from serialize/deserialize and size calculations; updated KvCacheConfig construction order. |
Nanobind (C++/Python) executor configcpp/tensorrt_llm/nanobind/executor/executorConfig.cpp |
Dropped onboard_blocks property and getstate element; note: setstate may still expect previous tuple size. |
Pybind KV cache bindingscpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp |
Removed onboard_blocks arg from exposed KVCacheManager ctors; adjusted py::init signatures (two bools → one before CacheType). |
Pybind executor configcpp/tensorrt_llm/pybind/executor/executorConfig.cpp |
Removed onboard_blocks property; updated pickling tuple to exclude it. |
Unit tests: KV cachecpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp |
Updated ctor calls to omit onboardBlocks; one merge-conflict artifact present; logic otherwise unchanged. |
Unit tests: serializationcpp/tests/unit_tests/executor/serializeUtilsTest.cpp |
Removed assertion on onboardBlocks in KvCacheConfig serialization test. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant Client
participant KVCacheManager
participant BlockManager
participant PrimaryPool
participant SecondaryPool
Client->>KVCacheManager: requestBlock()
KVCacheManager->>BlockManager: getFreeBlock()
alt primary has free block
BlockManager->>PrimaryPool: allocate()
PrimaryPool-->>BlockManager: block
else primary needs space
BlockManager->>SecondaryPool: offload eligible blocks
SecondaryPool-->>BlockManager: offloaded
BlockManager->>PrimaryPool: allocate()
PrimaryPool-->>BlockManager: block
end
BlockManager-->>KVCacheManager: block
KVCacheManager-->>Client: block
note over BlockManager,SecondaryPool: Onboarding/offloading no longer gated by onboardBlocks flag
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
- NVIDIA/TensorRT-LLM#6249 — Also edits KV cache managers’ constructor signatures, removing a boolean; overlaps same classes and APIs.
- NVIDIA/TensorRT-LLM#6244 — Modifies KVCacheManager/WindowBlockManager/BlockManager APIs and cache pool behavior; touches same headers.
- NVIDIA/TensorRT-LLM#6563 — Changes KvCacheConfig/KV cache interfaces (adds attention-DP fields); intersects with files updated here.
Suggested labels
KV-Cache Management
Suggested reviewers
- tomeras91
- Tabrizian
- thorjohnsen
- achartier
- Funatiq
✨ Finishing Touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
I pushed a fix in commit <commit_id>, please review it.Open a follow-up GitHub issue for this discussion.
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. - PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
Support
Need help? Create a ticket on our support page for assistance with any issues or questions.
CodeRabbit Commands (Invoked using PR/Issue comments)
Type @coderabbitai help to get the list of available commands.
Other keywords and placeholders
- Add
@coderabbitai ignoreor@coderabbit ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaior@coderabbitai titleanywhere in the PR title to generate the title automatically.
Status, Documentation and Community
- Visit our Status Page to check the current availability of CodeRabbit.
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
/bot run --disable-fail-fast
PR_Github #17338 [ run ] triggered by Bot
PR_Github #17338 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #13031 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #17361 [ run ] triggered by Bot
PR_Github #17361 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13049 completed with status: 'FAILURE'
/bot run
PR_Github #17484 [ run ] triggered by Bot
PR_Github #17484 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13139 completed with status: 'FAILURE'
/bot run
/bot run
PR_Github #17540 [ run ] triggered by Bot
PR_Github #17540 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #13186 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #20768 [ run ] triggered by Bot
PR_Github #20768 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15696 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #21008 [ run ] triggered by Bot
PR_Github #21008 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #15884 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #21251 [ run ] triggered by Bot
PR_Github #21251 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #16042 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #21257 [ run ] triggered by Bot
PR_Github #21257 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #16048 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #21267 [ run ] triggered by Bot
PR_Github #21267 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #16057 completed with status: 'FAILURE'
/bot run --disable-fail-fast