Woosuk Kwon comments

Results 284 comments of


                                            Woosuk Kwon

[RFC] Drop beam search support

@cadedaniel Thanks for the suggestion! Here's what we've decided to do: 1. We'll add a deprecation warning for beam search (#6402) and plan to release a new version next week....

[RFC] Drop beam search support

@nightflight-dk Thanks for your input! Are you using vLLM in production? If so, we'd be happy to discuss our plan with you.

[RFC] Drop beam search support

@hrsmanian @zhouyuan @lanking520 @nightflight-dk @HeegonJin @SemMulder @darabos @DhruvaBansal00 @tmostak @physicsrob @YooSungHyun @denadai2 @sjmielke @Reichenbachian @AaronFriel @hinnefe2 @mflaxman10 Due to strong pushback from the community, we have decided to reconsider this...

[RFC]: Deprecating vLLM V0

> I guess this should be pinned upfront to make it known to all, if this is effective immediately. @wwl2755 Just pinned it. Thanks!

[RFC]: Deprecating vLLM V0

> What's the status on supporting priority scheduling in V1? That is a critical feature for us... @gilljon Could you please elaborate more?

[RFC]: Deprecating vLLM V0

@robertgshaw2-redhat @warlockedward IIUC, V100 is not supported. None of the current attention backends support V100. T4 can be supported with the Triton attention backend though. Because Triton dropped T4 and...

The Third vLLM Bay Area Meetup (April 2nd 6-8:30pm)

Please find the meetup slides [here](https://docs.google.com/presentation/d/1A--47JAK4BJ39t954HyTkvtfwn0fkqtsL8NGFuslReM/edit?usp=sharing)!

[CI/Build][TPU] Add TPU CI test

@khluu Huge thanks for your help!

[Multimodal] Optimize Qwen2/2.5-VL startup time

@DarkLight1337 Thanks for sharing it! In my experiment, this PR reduces the startup time of Qwen2.5-VL-3B from 120 secs to 55 secs. It definitely helps. That said, I'm not sure...

[Multimodal] Optimize Qwen2/2.5-VL startup time

@ywang96 Thanks for the investigation. Didn't know that it is caused by the video input. 🤔