Feature Request: Support for Qwen2-VL
Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the README.md.
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Qwen just released Qwen2-VL 2B & 7B under the Apache 2.0 License.
Motivation
SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Possible Implementation
No response
+1 This would be another great addition!
This model is awesome
I am looking forward to it very much
+1 I am looking forward to it very much
We can try Llamafing it
+1
+1
+1
+1
+1
+1
+1
+1
Any updates?
+1
+1
+1
+1
+1
+1
I can not wait for it !!!
Maybe people should also express interest and ask Qwen2-VL devs to implement. https://github.com/QwenLM/Qwen2-VL/issues/7
Expect to use llama.cpp end side inference
Is anyone already working on this? If not, I would like to give it a try.
+1 is there any updates?
+1
+1
+1
+1
Could everybody please stop the +1 comments and instead just give a 👍🏽 on the first post? The +1 posts just add noise and make following any real updates annoying...