Woosuk Kwon
Woosuk Kwon
@amitm02 Could you please benchmark the performance for following three case? 1. Current main (before this PR) 2. FIFO scheduling w/ this PR 3. Priority scheduling w/ this PR (when...
Hi @ekagra-ranjan, thanks for the PR! This is wonderful and so useful! A few things to note: 1. #17211 can be critical for the e2e performance 2. Currently, our implementation...
@ekagra-ranjan Please fix the lint errors. :)
QQ: Can we use a different term instead of "manager" for the single-type managers? It's a bit confusing since they're lower-level than the KV cache manager, but still called managers.
> SingleTypeKVCacheController Doesn't sound like a better name 😅 ok let's keep "manager" and brainstorm whether there's a better option that "SingleTypeManager"
I'm seriously worried about this kind of PR, which adds many lines of code while I don't completely understand the needs. Unfortunately, we are aiming to reduce the size of...
@dilipgb Thanks for the prompt reply. I closed the PR to revert this. However, we might reconsider if we find any difficulty in maintaining it. While this PR itself doesn't...
@imkero Thanks for the PR! This is amazing 🚀 Could you please resolve the merge conflict and the lint error?
@imkero Just so you know: To fix the CI failure, we should move `numba` from `requirements/cuda.txt` and `requirements/rocm.txt` to `requirements/common.txt`.
Thanks for the progress! Please let me know when this PR is ready for review!