lmql icon indicating copy to clipboard operation
lmql copied to clipboard

Vision Support

Open ambroser53 opened this issue 2 years ago • 1 comments

Is there any plans to integrate images input into LMQL? With the new GPT-4V and open-source lightweight vision language models such as MPlug-Owl it would be incredibly useful. I work with MPlug-Owl quite a lot so would be happy to investigate this if I could be pointed in the right direction on where to start. A discussion for how they typically create multi-modal prompts in open-source models could be helpful in getting it working for more than just GPT-4.

ambroser53 avatar Nov 07 '23 15:11 ambroser53

We definitely want to support this. They have already been some discussion on Discord surrounding this. See the #dev channel for more details.

lbeurerkellner avatar Nov 10 '23 21:11 lbeurerkellner