Cade Daniel comments

Results 121 comments of


                                            Cade Daniel

[Frontend] Add bad_words_ids sampling parameter

Hi @Alvant . Please remember that vLLM committers are not paid and it is unfair to treat them with these kind of reminders. Regardless of whether this PR is mergable...

[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion

I will take a pass this week. also cc @LiuXiaoxuanPKU one major challenge is cuda graph support. it will be necessary since the pytorch dispatch and scheduling overhead is large...

[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion

Hi @sighingnow , thanks for adding this. Feedback: * The MultiQueryTop1Scorer is really good * For the cuda graph support, can we instead integrate with @LiuXiaoxuanPKU 's work in https://github.com/vllm-project/vllm/pull/6052...

[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion

I am not sure how soon it will be merged. cc @LiuXiaoxuanPKU

[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion

cc @comaniac any thoughts on how this cuda graph approach works with model runner / prepare inputs ?

[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion

One alternative is to move this to a custom model runner, just for spec decode. Do you think that's better or worse than the current approach?

[Core][Speculative Decoding] Add multi-query verifier for speculative decoding without batch expansion

SGTM. @sighingnow we can merge this PR as is if we remove the cuda graph stuff, or we can add cuda graph stuff (using one of the two approaches @comaniac...

[RFC]: Implement disaggregated prefilling via KV cache transfer

I gave a comment offline, pasting it here: > The concept makes sense in vLLM but I am concerned we are starting with the infra first instead of the impactful...

[Bug]: prefix-caching: inconsistent completions

We have an improved block manager which has better test coverage for prefix caching. We have tests which compare equality of prefix caching vs non-prefix caching -- so this case...

Benchmark: add H100 suite

this is awesome, thanks for adding