vllm [Bugfix][cache_kernels]: Fix OOB in cache

Purpose

This PR fixes a potential out-of-bounds (OOB) memory access in the gather_and_maybe_dequant_cache CUDA kernel, as originally reported in Issue #27909.

The bug was identified by static analysis. The root cause was that the offset (calculated from seq_starts[bid] / block_size) was not validated against the block_table_stride (the bound of the block table) before being used to access the batch_block_table pointer.

This PR adds two bounds checks (offset + pid < block_table_stride and offset + full_blocks_end < block_table_stride) to ensure all memory accesses are within the valid range of the block table.

Test Plan

A new unit test (tests/kernels/test_cache_kernels.py::test_gather_cache_oob_issue_27909) was added to specifically reproduce the exact boundary condition described in the issue (where seq_starts causes the offset to point near the end of the table).

Test Result

The fix was verified by running the new unit test under compute-sanitizer --tool memcheck.

Before fix: compute-sanitizer reported an Invalid global read error (as expected). After fix: The test passes, and compute_sanitizer reports no errors, confirming the OOB access is resolved.

Nov 14 '25 22:11 Flink-ddd

/gemini review

Nov 14 '25 22:11 Flink-ddd

@codex review

Nov 15 '25 03:11 Flink-ddd

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Nov 15 '25 03:11 chatgpt-codex-connector[bot]

/gemini review

Nov 15 '25 09:11 Flink-ddd

@codex review

Nov 15 '25 09:11 Flink-ddd

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Nov 15 '25 09:11 chatgpt-codex-connector[bot]

Thanks for the work! Could you also run lm_eval to verify the acc and vllm bench to show the E2E performance?

Absolutely, Thank you for your suggestion. I will run lm_eval and benchmark and update it.

Nov 15 '25 15:11 Flink-ddd

Hi @yewentao256 , Sorry for the late reply. I've been quite busy these past few days. I updated the lm_eval and benchmark tables. If you have time, pls take a look.

Nov 18 '25 10:11 Flink-ddd

Hi @yewentao256 , Thanks for your approval.

Nov 19 '25 02:11 Flink-ddd

Hi @yewentao256 , I have merged the latest changes from main using the Update branch button. I also investigated the failing CI checks: The external/flaky checks (403 and UCX errors) are unrelated to the code fix. When you have time, could you please click CI run again for me?

Nov 19 '25 03:11 Flink-ddd

Hi @yewentao256 , the latest run failed due to an OOM error in the examples-test. This is seem like an environment resource issue, not a code bug. When you have time, Could you pls check the Buildkite resource allocation or re-queue this test on a node with more GPU memory? Thanks!

Nov 19 '25 06:11 Flink-ddd

Hi @yewentao256 , since the previous CI run failed again due to the OOM error in the examples-test, and the core kernel fix is complete, this seems to be a persistent CI resource limitation. Could you please help adjust the CI configuration to run this specific test on a larger GPU node pool, Thank you!

Nov 20 '25 02:11 Flink-ddd

Force merging as the OOM isn't related

Nov 20 '25 10:11 DarkLight1337

vllm
vllm copied to clipboard

[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu

Purpose

Test Plan

Test Result

vllm vllm copied to clipboard

[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu

Purpose

Test Plan

Test Result

vllm
vllm copied to clipboard