ImageBind
ImageBind copied to clipboard
ImageBind One Embedding Space to Bind Them All
Thanks for your good jobs!!! I tried this, audio embedding dim of imagebind is 1024, but Detic model need embedding of 512 dim,Can you release matched model?For example,imagebind_base.pth?
If one only intends to use audio,video and text embeddings from imageBind for a project intended for commercial use, would Meta AI allow for such a usecase?
Hey, I just want to know if the cosine_similarity of sklearn can relplace the softmax. Thanks
Is there a way to get weights for a smaller model? @imisra
Can Imagebind generate images from complex sentence like DALLE? for example, "a corgi playing a flame throwing trumpet". I just see a demo with a simple word.
Hi. I noticed that when the input text sequence, truncation is performed to reduce the sequence to 77 tokens. However no EOT token is added at the end? For example,...
For the demo results, do not know how to continue to use。
Which decoder would work best to go from image to audio embedding to then the actual sound?
Wonderful work! In Table 2, the top-1 accuray of ImageNet1k is 77.7%, which is higher than CLIP(OpenCLIP) by 2.2%(2.0%). But ImageBind did not train the vision encoder and text encoder,...