Roger Wang
Roger Wang
@DarkLight1337 Thanks for sharing the thoughts! @zhuohan123 and I actually discussed about the use of `AutoProcessor`. I think the point is that today `vLLM` already relies on `AutoTokenizer`, and most...
For folks who came across this RFC, I have been working closely with @DarkLight1337 on several PRs: - [x] #4910 - [x] #4328 - [x] #4197 - [x] #5237 The...
> Hi forks, I think when we want to refactor the code, we should not only consider the multi modal input, but also the multi modal output. Hey @nukes! I...
> @ywang96 Thank you for assigning yourself to the review! After checking the Buildkite performance-benchmark pipeline, I noticed that it is stuck at the `Wait for container to be ready`...
@WoosukKwon @youkaichao sorry but I haven't got chance to work on this (got flu over the weekend) - will try to investigate more by end of Friday
Note - **we also have this issue on V0** but it wasn't this pronounced because the default max-num-seqs is 256 (instead of 1024 on V1)
Discussed with @youkaichao offline - for now we will "bypass" cumem tests for V1 and properly fix it for V1 sleep mode later.
@jpli02 looks like @vincent-4 is working on this but we totally don't mind collaborating! This will be a good way to learn about all of our features too :)
@thesuperzapper Friendly ping - let me know if there's anything I need to do for this PR!
I'm considering this to be completed. Hidden states processor has been already integrated into vLLM after some discussion within the vLLM core group, we have decided that this is where...