Woosuk Kwon

Results 65 issues of Woosuk Kwon

- [x] 1. Correctly initializing and loading the EAGLE draft model - [x] 2. Consider the lookahead slots in the KV cache manager - [x] 3. Cache `draft_probs` inside the...

speculative-decoding
v1

vLLM V1 has been the default engine since version v0.8.0, released approximately three months ago. With substantial user adoption and overwhelmingly positive feedback on V1, we propose formally deprecating vLLM...

RFC

## Essential Elements of an Effective PR Description Checklist - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". -...

ready
multi-modality
qwen

A hacky way to save ~6 secs in startup time.

# Key Changes * Remove persistent batch * No “reordering” & complex bookkeeping * Almost all CPU states are Numpy arrays → We can vectorize most of the Python loops...

documentation
needs-rebase
ci/build
v1
nvidia