mini-sglang
mini-sglang copied to clipboard
[Feature] Implement variable page size support
Summary
This PR removes the hardcoded restriction of page_size=1, allowing the engine to be configured with variable page sizes (e.g., 16, 32). This functionality is propagated through the Engine, Scheduler, and KV Cache layers to support more efficient PagedAttention.
Key Changes
- CLI: Added
--page-sizeargument toServerArgs. - KV Cache (
mha_pool.py): - Updatedkv_bufferinitialization to usetotal_slots(num_pages * page_size) instead of justnum_pages.- Flattened the underlying storage shape calculation.
- Engine: - Updated
dummy_pageandmax_seq_lencalculations to account for the configured page size.- Removed
assert page_size == 1constraints.
- Removed
- Scheduler: Updated
CacheManagerand memory managers (Naive/Radix) to accept and respect thepage_sizeparameter during initialization and integrity checks.