vllm
vllm copied to clipboard
[V1][WIP] 2nd try of Hybrid allocator for full attention & sliding window attention interleaved models
Trying another implementation of #12655