[Bug] fill_kv_cache 实现有bug

Open zxy1123 opened this issue 5 months ago • 1 comments

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

triton arange 参数要求是参数是2 的指数，fill_kv_cache 获取block 没有保证是2 的指数

Reproduction

qwen2_vl 3b 执行报错

Environment

qwen2_vl 3b

Error traceback

Jul 27 '25 02:07 zxy1123

这个 block_size 是 paged attention 中的 block 大小，是引擎的一个配置参数 https://github.com/InternLM/lmdeploy/blob/5f0647f1181312975f05d16eeb166d5a69afb6ef/lmdeploy/messages.py#L342

通常是要求必须是2的指数次的，如果不这么做，那么 fill_kv_cache / paged_attention 等很多模块/kernel都会受到影响，对性能没什么好处（要在 kernel 中加更多边界检查；attention 中的 tensorcore 使用也会更复杂）。因此这里对 block size 其实是有隐式的假设的，也许在启动引擎的检查中就应该加个断言？

Jul 28 '25 06:07 grimoire