sglang [Bug] Decode OOM with spec

Describe the bug

Bugs can be reproduced with this branch #13740.

python -m unittest test_eagle_infer_b.py

Nov 21 '25 18:11 hnyls2002

Anyone interested can take a look and get assigned. It is a good issue to deep dive into SGLang's memory managment and advanced speculative decodings.

Nov 21 '25 18:11 hnyls2002

I have experience with Spec, can I look into it?

Nov 22 '25 17:11 adityakamat24

@adityakamat24 Sure, please do it

Nov 22 '25 17:11 hnyls2002

Hey @hnyls2002

Did some digging. I think the OOM is from kv_allocated_len not getting updated in the paged allocation paths.

In eagle_info.py's prepare_for_verify(), the page_size == 1 branch updates kv_allocated_len after allocation. But the paged branch (else case) just calls alloc_paged_token_slots_extend() and never touches it. Same issue in eagle_worker.py _draft_preprocess_decode(). So when release_kv_cache() runs, it uses kv_allocated_len to know what to free. If that's never updated, nothing gets freed. With 400 requests doing multiple decode iterations, memory just leaks until OOM.

what do you think?

Nov 23 '25 05:11 adityakamat24

@hnyls2002 can I also try this?

Dec 07 '25 22:12 vyalamar