Interesting idea

Open kllgjc opened this issue 10 months ago • 1 comments

Could you do something like this with Qwen 2.5 VL (or internVL2.5) to make a multimodal vector embedding model? I'm dumb so couldn't do it myself, but I'm sure smart people like you all could!

Mar 10 '25 02:03 kllgjc

Nvm somebody already thought about it...

https://github.com/TIGER-AI-Lab/VLM2Vec

Mar 10 '25 02:03 kllgjc