vilio icon indicating copy to clipboard operation
vilio copied to clipboard

Using Image and text together for classification

Open karndeepsingh opened this issue 3 years ago • 1 comments

Hi, I want to train a Multi-modal using Image and Text for Multi-label classification.

Can you please help me to understand what latest multi-modal are available that takes image and text as an input and fine-tune on my classification task.

Looking forward to your reply.

thanks

karndeepsingh avatar Jun 23 '22 05:06 karndeepsingh

Hi, you can find a list of multi-modal models implemented in this codebase here

Muennighoff avatar Jun 23 '22 10:06 Muennighoff