ImageBind
ImageBind copied to clipboard
Text/Audio/Image > Video/3D
Great Job!
Will it support Text/Audio/Image > Video/3D
conversion, approximately when?
Thanks for your question. ImageBind learns a shared embeddings space across modalities, therefore it allows retrieval across modalities. If by conversion you mean generation, ImageBind features can be fed to other generation models (e.g. Stable diffusion), but it doesn't generate raw signals on its own.
Our models already supports Video. Video features can be extracted using load_and_transform_video_data
https://github.com/facebookresearch/ImageBind/blob/0f8620b6678fd24c35f172721ea6046ab5780890/data.py#L297 and passing the inputs to the model using the key ModalityType.VISION
in a similar manner to images. Please let us know if you any other questions.
Hi! Then for the demo of Audio-to-Image generation showcased on the website, I‘m wondering which generative model is used, and whether you plan to release the corresponding code. Thank you!
Hi! Then for the demo of Audio-to-Image generation showcased on the website, I‘m wondering which generative model is used, and whether you plan to release the corresponding code. Thank you!
We have a quick application on top of ImageBind: https://github.com/sail-sg/BindDiffusion It wires up ImageBind with stable diffusion. Go ahead and have a try.