Cade Daniel

Results 121 comments of Cade Daniel

* FYI prefix caching + sliding window doesn't work with block manager v1 yet. so it's ok to prioritize that deprioritize that for this PR https://github.com/vllm-project/vllm/blob/b8afa8b95a4eee008a9b72440620113e5bfbe962/vllm/core/block_manager_v1.py#L218-L220 * personally I think...

See https://github.com/vllm-project/vllm/issues/4537

> I would like to have a try on this! That would be great! Let me know if I can answer any questions

We'll want to prioritize this soon so that we can deprecate the V1 block manager. cc @ruthe98 @rkooo567 @mmoskal.

Yeah, I think we can take inspiration from a devnull or zero block used in operating systems

@robertgshaw2-neuralmagic for block manager V2 we still need to do profiling before we swap over. I made an issue for tracking https://github.com/vllm-project/vllm/issues/4537

sure if there's interest :) I mention it because `APC in BlockManagerv2 https://github.com/vllm-project/vllm/pull/4142` is not strictly necessary for release (block manager v2 not ready)

I feel the speculative decoding tests will balloon quite a bit as we add more framework features; we can get ahead of this by only triggering them if there's framework...