TensorRT-LLM [None][chore] Remove onboard block switch for KV cache manager

Summary by CodeRabbit

Refactor
- Simplified KV cache APIs by removing the onboard_blocks option; onboarding/offloading now handled automatically.
- Updated C++ and Python constructor signatures (and property bindings) to exclude onboard_blocks; parameter order adjusted accordingly.
- Removed onboard_blocks from serialization/pickling formats; saved state no longer includes this field.
Tests
- Updated unit tests to align with the streamlined APIs and serialization changes.

Description

This MR has no functional change intended.

Dead code elimination. The secondary block pool is derived when kv_cache_config::host_cache_size is specified. Whether we onboard/offload a kv cache block can be implicated from whether the manager has secondary block or not. The onboardBlocks toggle itself only adds complication. This commit removes it.

Test Coverage

Since no functional change is intended. No test change is needed.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
[x] Please check this after reviewing the above items as appropriate for this PR.

Sep 02 '25 07:09 eopXD

📝 Walkthrough

Walkthrough

Removes the onboardBlocks parameter and related logic from KV cache components across headers, implementations, bindings, serialization, and tests. Constructor signatures and parameter ordering are updated accordingly. Offload/onboard gating tied to onboardBlocks is eliminated. Python bindings and serialization schemas drop the onboard_blocks field. Tests adjusted to new APIs.

Changes

Cohort / File(s)	Summary
KV cache header API updates `cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`	Removed onboardBlocks from WindowBlockManager, BlockManager, KVCacheManager constructors; reordered parameters; removed member storing onboard policy.
Executor config header `cpp/include/tensorrt_llm/executor/executor.h`	KvCacheConfig: removed onboardBlocks ctor param, getter/setter, and private member.
KV cache implementation `cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`	Purged onboarding gating; updated constructors and internal calls; adjusted logging; offload/onboard decisions no longer depend on onboard flag.
Inflight batching usage `cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp`	Removed runtime guard requiring onboard blocks for certain FMHA; updated KVCacheManager construction to new signature.
Executor config impl `cpp/tensorrt_llm/executor/kvCacheConfig.cpp`	KvCacheConfig ctor drops onboardBlocks; reorders params; removes getter/setter implementations and member init.
Serialization changes `cpp/tensorrt_llm/executor/serialization.cpp`	Removed onboardBlocks from serialize/deserialize and size calculations; updated KvCacheConfig construction order.
Nanobind (C++/Python) executor config `cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`	Dropped onboard_blocks property and getstate element; note: setstate may still expect previous tuple size.
Pybind KV cache bindings `cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp`	Removed onboard_blocks arg from exposed KVCacheManager ctors; adjusted py::init signatures (two bools → one before CacheType).
Pybind executor config `cpp/tensorrt_llm/pybind/executor/executorConfig.cpp`	Removed onboard_blocks property; updated pickling tuple to exclude it.
Unit tests: KV cache `cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`	Updated ctor calls to omit onboardBlocks; one merge-conflict artifact present; logic otherwise unchanged.
Unit tests: serialization `cpp/tests/unit_tests/executor/serializeUtilsTest.cpp`	Removed assertion on onboardBlocks in KvCacheConfig serialization test.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant KVCacheManager
  participant BlockManager
  participant PrimaryPool
  participant SecondaryPool

  Client->>KVCacheManager: requestBlock()
  KVCacheManager->>BlockManager: getFreeBlock()
  alt primary has free block
    BlockManager->>PrimaryPool: allocate()
    PrimaryPool-->>BlockManager: block
  else primary needs space
    BlockManager->>SecondaryPool: offload eligible blocks
    SecondaryPool-->>BlockManager: offloaded
    BlockManager->>PrimaryPool: allocate()
    PrimaryPool-->>BlockManager: block
  end
  BlockManager-->>KVCacheManager: block
  KVCacheManager-->>Client: block

  note over BlockManager,SecondaryPool: Onboarding/offloading no longer gated by onboardBlocks flag

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#6249 — Also edits KV cache managers’ constructor signatures, removing a boolean; overlaps same classes and APIs.
NVIDIA/TensorRT-LLM#6244 — Modifies KVCacheManager/WindowBlockManager/BlockManager APIs and cache pool behavior; touches same headers.
NVIDIA/TensorRT-LLM#6563 — Changes KvCacheConfig/KV cache interfaces (adds attention-DP fields); intersects with files updated here.

Suggested labels

KV-Cache Management

Suggested reviewers

tomeras91
Tabrizian
thorjohnsen
achartier
Funatiq

✨ Finishing Touches

[ ] 📝 Generate Docstrings

🧪 Generate unit tests

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Sep 02 '25 07:09 coderabbitai[bot]

/bot run --disable-fail-fast

Sep 02 '25 08:09 eopXD

PR_Github #17338 [ run ] triggered by Bot

Sep 02 '25 08:09 tensorrt-cicd

PR_Github #17338 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #13031 completed with status: 'FAILURE'

Sep 02 '25 09:09 tensorrt-cicd

/bot run --disable-fail-fast

Sep 02 '25 11:09 eopXD

PR_Github #17361 [ run ] triggered by Bot

Sep 02 '25 11:09 tensorrt-cicd

PR_Github #17361 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #13049 completed with status: 'FAILURE'

Sep 02 '25 17:09 tensorrt-cicd

/bot run

Sep 03 '25 05:09 eopXD

PR_Github #17484 [ run ] triggered by Bot

Sep 03 '25 06:09 tensorrt-cicd

PR_Github #17484 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #13139 completed with status: 'FAILURE'

Sep 03 '25 07:09 tensorrt-cicd

/bot run

Sep 03 '25 13:09 eopXD

/bot run

Sep 03 '25 13:09 eopXD

PR_Github #17540 [ run ] triggered by Bot

Sep 03 '25 13:09 tensorrt-cicd

PR_Github #17540 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #13186 completed with status: 'FAILURE'

Sep 03 '25 15:09 tensorrt-cicd

/bot run --disable-fail-fast

Oct 08 '25 05:10 eopXD

PR_Github #20768 [ run ] triggered by Bot

Oct 08 '25 05:10 tensorrt-cicd

PR_Github #20768 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #15696 completed with status: 'FAILURE'

Oct 08 '25 06:10 tensorrt-cicd

/bot run --disable-fail-fast

Oct 10 '25 08:10 eopXD

PR_Github #21008 [ run ] triggered by Bot

Oct 10 '25 08:10 tensorrt-cicd

PR_Github #21008 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #15884 completed with status: 'FAILURE'

Oct 10 '25 09:10 tensorrt-cicd

/bot run --disable-fail-fast

Oct 13 '25 20:10 eopXD

PR_Github #21251 [ run ] triggered by Bot

Oct 13 '25 21:10 tensorrt-cicd

PR_Github #21251 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #16042 completed with status: 'FAILURE'

Oct 13 '25 21:10 tensorrt-cicd

/bot run --disable-fail-fast

Oct 13 '25 22:10 eopXD

PR_Github #21257 [ run ] triggered by Bot

Oct 13 '25 22:10 tensorrt-cicd

PR_Github #21257 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #16048 completed with status: 'FAILURE'

Oct 13 '25 23:10 tensorrt-cicd

/bot run --disable-fail-fast

Oct 14 '25 00:10 eopXD

PR_Github #21267 [ run ] triggered by Bot

Oct 14 '25 01:10 tensorrt-cicd

PR_Github #21267 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #16057 completed with status: 'FAILURE'

Oct 14 '25 04:10 tensorrt-cicd

/bot run --disable-fail-fast

Oct 15 '25 02:10 eopXD