vllm
vllm copied to clipboard
[Bugfix][cache_kernels]: Fix OOB in cache_kernels.cu
Purpose
This PR fixes a potential out-of-bounds (OOB) memory access in the gather_and_maybe_dequant_cache CUDA kernel, as originally reported in Issue #27909.
The bug was identified by static analysis. The root cause was that the offset (calculated from seq_starts[bid] / block_size) was not validated against the block_table_stride (the bound of the block table) before being used to access the batch_block_table pointer.
This PR adds two bounds checks (offset + pid < block_table_stride and offset + full_blocks_end < block_table_stride) to ensure all memory accesses are within the valid range of the block table.
Test Plan
A new unit test (tests/kernels/test_cache_kernels.py::test_gather_cache_oob_issue_27909) was added to specifically reproduce the exact boundary condition described in the issue (where seq_starts causes the offset to point near the end of the table).
Test Result
The fix was verified by running the new unit test under compute-sanitizer --tool memcheck.
Before fix: compute-sanitizer reported an Invalid global read error (as expected).
After fix: The test passes, and compute_sanitizer reports no errors, confirming the OOB access is resolved.
/gemini review
@codex review
Codex Review: Didn't find any major issues. What shall we delve into next?
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
/gemini review
@codex review
Codex Review: Didn't find any major issues. Hooray!
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Thanks for the work! Could you also run
lm_evalto verify the acc andvllm benchto show the E2E performance?
Absolutely, Thank you for your suggestion. I will run lm_eval and benchmark and update it.
Hi @yewentao256 , Sorry for the late reply. I've been quite busy these past few days. I updated the lm_eval and benchmark tables. If you have time, pls take a look.
Hi @yewentao256 , Thanks for your approval.
Hi @yewentao256 , I have merged the latest changes from main using the Update branch button. I also investigated the failing CI checks: The external/flaky checks (403 and UCX errors) are unrelated to the code fix. When you have time, could you please click CI run again for me?
Hi @yewentao256 , the latest run failed due to an OOM error in the examples-test. This is seem like an environment resource issue, not a code bug. When you have time, Could you pls check the Buildkite resource allocation or re-queue this test on a node with more GPU memory? Thanks!
Hi @yewentao256 , since the previous CI run failed again due to the OOM error in the examples-test, and the core kernel fix is complete, this seems to be a persistent CI resource limitation. Could you please help adjust the CI configuration to run this specific test on a larger GPU node pool, Thank you!
Force merging as the OOM isn't related