lighter internVL for contrastive use only
Hi, I am student working on joint embedding representation learning of text and image. I came accross your model and I am trying to load it and run it. However, the model is very heavy and i have trouble with my computational ressources (NVIDIA T4 x2) to load the model. Is there a way to only load the encoders and QLLama (instead of the whole model including the LLM) or is it possible in some way to make the model smaller so that i can load it ?
Many Thanks !
May I ask are you trying to use this model weight: https://huggingface.co/OpenGVLab/InternVL-14B-224px
InternVL-14B-224px is the only CLIP-like model we have released so far, and there are currently no plans to develop a smaller CLIP model.
Since there hasn't been much activity for a while, I will close this issue. If you have any questions, please don't hesitate to open it again.