Cyrus Leung
Cyrus Leung
> Regarding #4228, I think there may be a situation that some MM models don't have a Processor implemented. > > > In this case, we would have to refactor...
Also, I think that we should wrap the input prompt to `LLM.generate` in order to better distinguish the `kwargs` to pass to the HF processor from the other arguments to...
~~I have noticed when using distributed inference on LLaVA-NeXT (#4199), there is a bug where the image tokens are not sent to the workers, resulting in an error when trying...
> I think we can refer to `get_config()` in `transformers_utils/config.py`, but searching registried processor firstly then `AutoProcessor`, so that the `get_processor()` could be: > > ```python > def get_processor(model: str,...
For reference, I have compiled a list of GH issues that are related to this topic (updated periodically): Multi-modal core: - Encoder-decoder support (cross-attention across text only) - #5934 -...
> The current prompt format `"" * 576 + prompt` makes the underlying implementation easier (especially when it comes to profiling), but complicates the user experience compared to huggingface format...
Would be nice if #4794 is also made available via CLI (perhaps `vllm batch`?).