text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Request for Multimodal Pipeline Support in LLava 1.6

Open yoshuzx opened this issue 1 year ago • 2 comments

With the recent release of LLava 1.6, are there any plans to incorporate multimodal pipeline support into the Oobabooga extension for version 1.6? It would be highly beneficial if we could utilize the multimodal extension with 1.6, similar to the one available in the previous LLava 1.5 model.

Thank you in advance for your consideration.

yoshuzx avatar Jan 31 '24 21:01 yoshuzx

Llava 1.6 has some really neat features that might not carry directly over to the existing multimodal framework, like the way it dices up large images as part of its preprocessing.

If an expert could weigh in on the required steps that might help other devs get an idea of the changes necessary / provide some of the lighter parts of the implementation.

Llava 1.6 support is also a feature I am also very eager to have implemented!

tim-win avatar Feb 05 '24 15:02 tim-win

I agree. I don't understand why they haven't added support for Qwen and LLaVa yet

Fenfel avatar Feb 12 '24 11:02 Fenfel

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Apr 13 '24 23:04 github-actions[bot]

@oobabooga

Fenfel avatar Apr 13 '24 23:04 Fenfel