mini-sglang icon indicating copy to clipboard operation
mini-sglang copied to clipboard

[Feature] Implement variable page size support

Open DhiraPT opened this issue 4 days ago • 3 comments

Summary

This PR removes the hardcoded restriction of page_size=1, allowing the engine to be configured with variable page sizes (e.g., 16, 32). This functionality is propagated through the Engine, Scheduler, and KV Cache layers to support more efficient PagedAttention.

Key Changes

  • CLI: Added --page-size argument to ServerArgs.
  • KV Cache (mha_pool.py): - Updated kv_buffer initialization to use total_slots (num_pages * page_size) instead of just num_pages.
    • Flattened the underlying storage shape calculation.
  • Engine: - Updated dummy_page and max_seq_len calculations to account for the configured page size.
    • Removed assert page_size == 1 constraints.
  • Scheduler: Updated CacheManager and memory managers (Naive/Radix) to accept and respect the page_size parameter during initialization and integrity checks.

DhiraPT avatar Dec 22 '25 10:12 DhiraPT