InternVL lighter internVL for contrastive use only

Hi, I am student working on joint embedding representation learning of text and image. I came accross your model and I am trying to load it and run it. However, the model is very heavy and i have trouble with my computational ressources (NVIDIA T4 x2) to load the model. Is there a way to only load the encoders and QLLama (instead of the whole model including the LLM) or is it possible in some way to make the model smaller so that i can load it ?

Many Thanks !

May 02 '24 14:05 mat10599

May I ask are you trying to use this model weight: https://huggingface.co/OpenGVLab/InternVL-14B-224px

May 08 '24 15:05 czczup

InternVL-14B-224px is the only CLIP-like model we have released so far, and there are currently no plans to develop a smaller CLIP model.

Since there hasn't been much activity for a while, I will close this issue. If you have any questions, please don't hesitate to open it again.

May 30 '24 14:05 czczup