text-generation-webui
text-generation-webui copied to clipboard
Request for Multimodal Pipeline Support in LLava 1.6
With the recent release of LLava 1.6, are there any plans to incorporate multimodal pipeline support into the Oobabooga extension for version 1.6? It would be highly beneficial if we could utilize the multimodal extension with 1.6, similar to the one available in the previous LLava 1.5 model.
Thank you in advance for your consideration.
Llava 1.6 has some really neat features that might not carry directly over to the existing multimodal framework, like the way it dices up large images as part of its preprocessing.
If an expert could weigh in on the required steps that might help other devs get an idea of the changes necessary / provide some of the lighter parts of the implementation.
Llava 1.6 support is also a feature I am also very eager to have implemented!
I agree. I don't understand why they haven't added support for Qwen and LLaVa yet
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
@oobabooga