Woosuk Kwon

Results 284 comments of Woosuk Kwon

Hi @rahulbatra85 Thanks for the PR! Could you please provide performance benchmarks? Particularly, the perf benchmark in the Llama setting (head_size=128, num_kv_heads=8, etc.) would be useful.

@youkaichao Thanks for letting me know! Just fixed it.

> Can you elaborate on the custom Pallas kernel for PagedAttention? Is there any links? Good question. It's not open-sourced yet, but I was told that it will be released...

> Is this true after we moved to 1dquery? Or does it mean we need to support both 1d and 2d query inputs? @rkooo567 I believe the change won't affect...

cc @ruisearch42 @richardliaw @comaniac

@zhuohan123 Can you please take another look?

@22quinn Thanks for volunteering! Could you please submit a PR by EoW?

@akeshet We plan to re-design the API for that. We will probably not allow per-request logits processor (because this is too complex and slow). We are exploring other options. Please...

@maliknaik16 Please feel free to take it! @22quinn Let us know if you already have the PR.