Woosuk Kwon

Results 284 comments of Woosuk Kwon

@cadedaniel Thanks for the suggestion! Here's what we've decided to do: 1. We'll add a deprecation warning for beam search (#6402) and plan to release a new version next week....

@nightflight-dk Thanks for your input! Are you using vLLM in production? If so, we'd be happy to discuss our plan with you.

@hrsmanian @zhouyuan @lanking520 @nightflight-dk @HeegonJin @SemMulder @darabos @DhruvaBansal00 @tmostak @physicsrob @YooSungHyun @denadai2 @sjmielke @Reichenbachian @AaronFriel @hinnefe2 @mflaxman10 Due to strong pushback from the community, we have decided to reconsider this...

> I guess this should be pinned upfront to make it known to all, if this is effective immediately. @wwl2755 Just pinned it. Thanks!

> What's the status on supporting priority scheduling in V1? That is a critical feature for us... @gilljon Could you please elaborate more?

@robertgshaw2-redhat @warlockedward IIUC, V100 is not supported. None of the current attention backends support V100. T4 can be supported with the Triton attention backend though. Because Triton dropped T4 and...

Please find the meetup slides [here](https://docs.google.com/presentation/d/1A--47JAK4BJ39t954HyTkvtfwn0fkqtsL8NGFuslReM/edit?usp=sharing)!

@khluu Huge thanks for your help!

@DarkLight1337 Thanks for sharing it! In my experiment, this PR reduces the startup time of Qwen2.5-VL-3B from 120 secs to 55 secs. It definitely helps. That said, I'm not sure...

@ywang96 Thanks for the investigation. Didn't know that it is caused by the video input. 🤔