ImageBind issues

Results 81 ImageBind issues

Sort by recently updated

Simply replacing Detic's CLIP-based ‘class’ enbedding with imagebind audio embedding

Thanks for your good jobs!!! I tried this, audio embedding dim of imagebind is 1024, but Detic model need embedding of 512 dim，Can you release matched model？For example，imagebind_base.pth？

youngstear

Imagebind for commercial purposes

If one only intends to use audio,video and text embeddings from imageBind for a project intended for commercial use, would Meta AI allow for such a usecase?

abhimanyu891998

Is OK to use cosine_similarity instead softmax for VISION x TEXT ?

Hey, I just want to know if the cosine_similarity of sklearn can relplace the softmax. Thanks

XinyueZ

Checkpoints for small/medium model

Is there a way to get weights for a smaller model? @imisra

abhimanyu891998

DALLE-2 Image generation

Can Imagebind generate images from complex sentence like DALLE? for example, "a corgi playing a flame throwing trumpet". I just see a demo with a simple word.

HoyeonM

No EOT when long sequence is truncated?

Hi. I noticed that when the input text sequence, truncation is performed to reduce the sequence to 77 tokens. However no EOT token is added at the end? For example,...

bakachan19

How to use Imagebind for text and image recognition retrieval

For the demo results, do not know how to continue to use。

guochunjiang

Usage is not clear: needed - example usages to generate images from other images+text would great.

celster

Image to audio

Which decoder would work best to go from image to audio embedding to then the actual sound?

delamarifer

Confuse about ImageNet1k results

Wonderful work! In Table 2, the top-1 accuray of ImageNet1k is 77.7%, which is higher than CLIP(OpenCLIP) by 2.2%(2.0%). But ImageBind did not train the vision encoder and text encoder,...

LinB203

ImageBind
ImageBind copied to clipboard

Metadata

Simply replacing Detic's CLIP-based ‘class’ enbedding with imagebind audio embedding

Imagebind for commercial purposes

Is OK to use cosine_similarity instead softmax for VISION x TEXT ?

Checkpoints for small/medium model

DALLE-2 Image generation

No EOT when long sequence is truncated?

How to use Imagebind for text and image recognition retrieval

Usage is not clear: needed - example usages to generate images from other images+text would great.

Image to audio

Confuse about ImageNet1k results

← Metadata

Owner

Metadata

ImageBind ImageBind copied to clipboard

Metadata

← Metadata

Owner

Metadata

ImageBind
ImageBind copied to clipboard