Woosuk Kwon

Results 65 issues of Woosuk Kwon

### Motivation. ### Overview As we transition to vLLM V1, we plan to discontinue support for the `best_of` sampling parameter. This decision is driven by a combination of low usage,...

RFC

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug When using Ray as the distributed executor backend...

bug
ray
v1

Pipeline parallelism in V1 requires `ray[adag]` instead of `ray[default]`. Also, because of the API changes in 2.42.0, we have to pin the version to `2.41.0` (or 2.40.0).

ci/build

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug Got the error message when using tp_size=4: ```...

bug

cc @comaniac @ruisearch42

v1

### 🚀 The feature, motivation and pitch Currently, the V1 rejection sampler only supports greedy sampling. We need to expand it to random sampling. I think we can do this...

feature request

### 🚀 The feature, motivation and pitch The current V1 rejection sampler is not optimized enough, taking unnecessary overheads. In my benchmarks, this takes 10-25% of the overall running time....

feature request

### 🚀 The feature, motivation and pitch DeepSeek MTP should be ported to the new V1 architecture. ### Alternatives _No response_ ### Additional context _No response_ ### Before submitting a...

feature request

# Progress - [x] Implement TPU executor that works on a single TPU chip (without tensor parallelism) #5292 - [x] Support single-host tensor parallel inference #5871 - [x] Support multi-host...

RFC
tpu
stale

### Motivation. For code cleanup, we plan to drop the support for prompt adapter. Please let us know if you are using this feature. ### Proposed Change. Dropping the prompt...

RFC