Woosuk Kwon comments

Results 282 comments of


                                            Woosuk Kwon

[SpecDecode] Support EAGLE in V1

Hi @wwl2755, the task 4 has already been addressed by #16087 and 1 is being handled by #16035. Would you be interested in others (5, 6, 7 particularly)?

[SpecDecode] Support EAGLE in V1

@oreo-wjx Thanks for bringing it up. Yeah we are aware of the issue. Fundamentally, it's because we set the random seed as `None` by default. Setting the random seed to...

[V1][Spec Decode] Ngram Spec Decode

@njhill Thanks for the detailed review! I do agree that this PR currently needs more iterations and should not break/degrade any case when spec decoding is unused. > Tasks out...

[V1][Spec Decode] Ngram Spec Decode

@LiuXiaoxuanPKU Could you please provide performance benchmarks? 1. Main branch 2. This PR without spec decoding 3. This PR with spec decoding + low QPS 4. This PR with spec...

[V1][Spec Decode] Ngram Spec Decode

@LiuXiaoxuanPKU Could you take a look at the failed tests? I'll approve the PR once the tests are green!

[V1][Spec Decode] Ngram Spec Decode

@LiuXiaoxuanPKU Just wanted to double check. Is the PR ready for merge?

[V1][Spec Decode] Ngram Spec Decode

@JaheimLee Thanks for reporting it! Fixed by #13359

[Benchmark] Add block_size option to benchmark_throughput.py

Can we actually remove this parameter and let each hardware or attention backend choose their own? @liangfu Does this sound good to you if we make such a change?

[P/D][V1] KV Connector API V1

@robertgshaw2-redhat As discussed offline, I'm ok with merging this PR. However, I'd like to defer any other followup PRs (such as #16625) until we land the hybrid memory allocator, since...

[Bug]: Mistral 3.1 Small Image inference is broken on 0.8.4

I haven't got to the root cause yet, but I feel the bug should be in the input processor. In @mgoin's example, the single image maps to 7920 tokens, which...