Roger Wang comments

Results 132 comments of


                                            Roger Wang

[RFC]: Multi-modality Support on vLLM

@DarkLight1337 Thanks for sharing the thoughts! @zhuohan123 and I actually discussed about the use of `AutoProcessor`. I think the point is that today `vLLM` already relies on `AutoTokenizer`, and most...

[RFC]: Multi-modality Support on vLLM

For folks who came across this RFC, I have been working closely with @DarkLight1337 on several PRs: - [x] #4910 - [x] #4328 - [x] #4197 - [x] #5237 The...

[RFC]: Multi-modality Support on vLLM

> Hi forks, I think when we want to refactor the code, we should not only consider the multi modal input, but also the multi modal output. Hey @nukes! I...

[CI/Build][v1] vLLM v1 automatic benchmarking

> @ywang96 Thank you for assigning yourself to the review! After checking the Buildkite performance-benchmark pipeline, I noticed that it is stuck at the `Wait for container to be ready`...

[V1][Core] Fix memory issue with logits & sampling

@WoosukKwon @youkaichao sorry but I haven't got chance to work on this (got flu over the weekend) - will try to investigate more by end of Friday

[V1][Core] Fix memory issue with logits & sampling

Note - **we also have this issue on V0** but it wasn't this pronounced because the default max-num-seqs is 256 (instead of 1024 on V1)

[V1][Core] Fix memory issue with logits & sampling

Discussed with @youkaichao offline - for now we will "bypass" cumem tests for V1 and properly fix it for V1 sleep mode later.

[Usage]: Clean up Engine Args & Documentation

@jpli02 looks like @vincent-4 is working on this but we totally don't mind collaborating! This will be a good way to learn about all of our features too :)

fix(crud-web-apps): fix condition checking when missing message

@thesuperzapper Friendly ping - let me know if there's anything I need to do for this PR!

[RFC]: Extending VLLM towards native support of non text-generating models

I'm considering this to be completed. Hidden states processor has been already integrated into vLLM after some discussion within the vLLM core group, we have decided that this is where...