[Cherry-Pick][Speculative Decoding][BugFix]Fix attention bug in spec decoding(#5460)
Motivation
Cherry-pick PR #5460 to release/2.4 branch to fix KV partitioning bug in speculative decoding.
Modifications
fastdeploy/engine/engine.py
- Removed hardcoded
FLAGS_max_partition_size=1024for multimodal models
fastdeploy/model_executor/layers/attention/append_attn_backend.py
- Disable KV partitioning when speculative decoding is active by setting
max_partition_size = max_seq_len - Prevents split KV bug that corrupts attention in speculative decoding
Usage or Command
No usage changes. Fix is applied automatically when speculative decoding is enabled.
Accuracy Tests
Accuracy testing performed in original PR #5460.
Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]] - You can add new tags based on the PR content, but the semantics must be clear.
- Tag list: [
- [x] Format your code, run
pre-commitbefore commit. - [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [x] If the current PR is submitting to the
releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.
Original prompt
This section details on the original issue you should resolve
<issue_title>把这个 https://github.com/PaddlePaddle/FastDeploy/pull/5460 改动也给 release/2.4 提交一下</issue_title> <issue_description></issue_description>
<agent_instructions>cherry-pick PR</agent_instructions>
Comments on the Issue (you are @copilot in this section)
- Fixes PaddlePaddle/FastDeploy#5478
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Thanks for your contribution!