Woosuk Kwon comments

Results 284 comments of


                                            Woosuk Kwon

[Kernel] Triton Paged Attn Decode Kernel

Hi @rahulbatra85 Thanks for the PR! Could you please provide performance benchmarks? Particularly, the perf benchmark in the Llama setting (head_size=128, num_kv_heads=8, etc.) would be useful.

[RFC] Initial Support for Cloud TPUs

@youkaichao Thanks for letting me know! Just fixed it.

[RFC] Initial Support for Cloud TPUs

> Can you elaborate on the custom Pallas kernel for PagedAttention? Is there any links? Good question. It's not open-sourced yet, but I was told that it will be released...

[RFC] Initial Support for Cloud TPUs

> Is this true after we moved to 1dquery? Or does it mean we need to support both 1d and 2d query inputs? @rkooo567 I believe the change won't affect...

[V1][Bug]: TP with Ray does not terminate gracefully

cc @ruisearch42 @richardliaw @comaniac

Add `vllm_v1`

@zhuohan123 Can you please take another look?

[V1][Help Wanted] Porting missing sampling parameters to V1

@22quinn Thanks for volunteering! Could you please submit a PR by EoW?

[V1][Help Wanted] Porting missing sampling parameters to V1

@akeshet We plan to re-design the API for that. We will probably not allow per-request logits processor (because this is too complex and slow). We are exploring other options. Please...

[V1][Help Wanted] Porting missing sampling parameters to V1

@maliknaik16 Please feel free to take it! @22quinn Let us know if you already have the PR.

[V1][Help Wanted] Porting missing sampling parameters to V1

@22quinn Oh great. Thanks!