Woosuk Kwon
Woosuk Kwon
- [x] 1. Correctly initializing and loading the EAGLE draft model - [x] 2. Consider the lookahead slots in the KV cache manager - [x] 3. Cache `draft_probs` inside the...
vLLM V1 has been the default engine since version v0.8.0, released approximately three months ago. With substantial user adoption and overwhelmingly positive feedback on V1, we propose formally deprecating vLLM...
## Essential Elements of an Effective PR Description Checklist - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". -...
A hacky way to save ~6 secs in startup time.
# Key Changes * Remove persistent batch * No “reordering” & complex bookkeeping * Almost all CPU states are Numpy arrays → We can vectorize most of the Python loops...