sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Bug] Decode OOM with spec

Open hnyls2002 opened this issue 1 month ago • 3 comments

Describe the bug

Bugs can be reproduced with this branch #13740.

python -m unittest test_eagle_infer_b.py

hnyls2002 avatar Nov 21 '25 18:11 hnyls2002

Anyone interested can take a look and get assigned. It is a good issue to deep dive into SGLang's memory managment and advanced speculative decodings.

hnyls2002 avatar Nov 21 '25 18:11 hnyls2002

I have experience with Spec, can I look into it?

adityakamat24 avatar Nov 22 '25 17:11 adityakamat24

@adityakamat24 Sure, please do it

hnyls2002 avatar Nov 22 '25 17:11 hnyls2002

Hey @hnyls2002

Did some digging. I think the OOM is from kv_allocated_len not getting updated in the paged allocation paths.

In eagle_info.py's prepare_for_verify(), the page_size == 1 branch updates kv_allocated_len after allocation. But the paged branch (else case) just calls alloc_paged_token_slots_extend() and never touches it. Same issue in eagle_worker.py _draft_preprocess_decode(). So when release_kv_cache() runs, it uses kv_allocated_len to know what to free. If that's never updated, nothing gets freed. With 400 requests doing multiple decode iterations, memory just leaks until OOM.

what do you think?

adityakamat24 avatar Nov 23 '25 05:11 adityakamat24

@hnyls2002 can I also try this?

vyalamar avatar Dec 07 '25 22:12 vyalamar