Cyrus Leung comments

Results 137 comments of


                                            Cyrus Leung

[RFC]: Multi-modality Support on vLLM

> Regarding #4228, I think there may be a situation that some MM models don't have a Processor implemented. > > > In this case, we would have to refactor...

[RFC]: Multi-modality Support on vLLM

Also, I think that we should wrap the input prompt to `LLM.generate` in order to better distinguish the `kwargs` to pass to the HF processor from the other arguments to...

[RFC]: Multi-modality Support on vLLM

~~I have noticed when using distributed inference on LLaVA-NeXT (#4199), there is a bug where the image tokens are not sent to the workers, resulting in an error when trying...

[RFC]: Multi-modality Support on vLLM

> I think we can refer to `get_config()` in `transformers_utils/config.py`, but searching registried processor firstly then `AutoProcessor`, so that the `get_processor()` could be: > > ```python > def get_processor(model: str,...

[RFC]: Multi-modality Support on vLLM

For reference, I have compiled a list of GH issues that are related to this topic (updated periodically): Multi-modal core: - Encoder-decoder support (cross-attention across text only) - #5934 -...

[RFC]: Multi-modality Support on vLLM

> The current prompt format `"" * 576 + prompt` makes the underlying implementation easier (especially when it comes to profiling), but complicates the user experience compared to huggingface format...

Add `vllm serve` to wrap `vllm.entrypoints.openai.api_server`

Would be nice if #4794 is also made available via CLI (perhaps `vllm batch`?).