Woosuk Kwon
Woosuk Kwon
Hi @wwl2755, the task 4 has already been addressed by #16087 and 1 is being handled by #16035. Would you be interested in others (5, 6, 7 particularly)?
@oreo-wjx Thanks for bringing it up. Yeah we are aware of the issue. Fundamentally, it's because we set the random seed as `None` by default. Setting the random seed to...
@njhill Thanks for the detailed review! I do agree that this PR currently needs more iterations and should not break/degrade any case when spec decoding is unused. > Tasks out...
@LiuXiaoxuanPKU Could you please provide performance benchmarks? 1. Main branch 2. This PR without spec decoding 3. This PR with spec decoding + low QPS 4. This PR with spec...
@LiuXiaoxuanPKU Could you take a look at the failed tests? I'll approve the PR once the tests are green!
@LiuXiaoxuanPKU Just wanted to double check. Is the PR ready for merge?
@JaheimLee Thanks for reporting it! Fixed by #13359
Can we actually remove this parameter and let each hardware or attention backend choose their own? @liangfu Does this sound good to you if we make such a change?
@robertgshaw2-redhat As discussed offline, I'm ok with merging this PR. However, I'd like to defer any other followup PRs (such as #16625) until we land the hybrid memory allocator, since...
I haven't got to the root cause yet, but I feel the bug should be in the input processor. In @mgoin's example, the single image maps to 7920 tokens, which...