DarkSharpness
DarkSharpness
Hi. At this moment, we don't have plans to support diffusion models. Compared to language model serving, diffusion workloads are quite different and introduce significantly more complexity, and it's non-trivial...
Thanks. Actually, when `page_size >1`, the page indices allocation logic is quite different. The page indices must be (de)allocated at a granularity of `page_size`. It's much trickier and does not...
@DhiraPT Yes. For future support of MLA models, as popular attention implementation like `FlashMLA` and `trtllm_mla_decode` (from flashinfer) requires a fixed page size of 64 or 128, we need this...
LGTM. Will get it merged after we implement MLA model.
Thanks. Personally I think this is too heavy for mini-sglang. In addition, I'm not sure whether it will bring concrete performance gain in a real-world setting. Since we already have...