LanguageBind
LanguageBind copied to clipboard
Any plans to use Long-CLIP to extend text input token limit?
If i read your paper right, you have frozen the CLIP text encoder and only aligned the other modalities. Do you think a pretrained Long-CLIP model could be used as a drop in replacement for LanguageBind to extend the token limit?