ImageBind Text/Audio/Image

Text/Audio/Image > Video/3D

Open becauseofAI opened this issue 1 year ago • 3 comments

Great Job!

Will it support Text/Audio/Image > Video/3D conversion, approximately when?

May 10 '23 03:05 becauseofAI

Thanks for your question. ImageBind learns a shared embeddings space across modalities, therefore it allows retrieval across modalities. If by conversion you mean generation, ImageBind features can be fed to other generation models (e.g. Stable diffusion), but it doesn't generate raw signals on its own.

Our models already supports Video. Video features can be extracted using load_and_transform_video_data https://github.com/facebookresearch/ImageBind/blob/0f8620b6678fd24c35f172721ea6046ab5780890/data.py#L297 and passing the inputs to the model using the key ModalityType.VISION in a similar manner to images. Please let us know if you any other questions.

May 10 '23 11:05 aelnouby

Hi! Then for the demo of Audio-to-Image generation showcased on the website, I‘m wondering which generative model is used, and whether you plan to release the corresponding code. Thank you!

May 11 '23 15:05 FionaFAN22

Hi! Then for the demo of Audio-to-Image generation showcased on the website, I‘m wondering which generative model is used, and whether you plan to release the corresponding code. Thank you!

We have a quick application on top of ImageBind: https://github.com/sail-sg/BindDiffusion It wires up ImageBind with stable diffusion. Go ahead and have a try.

May 16 '23 07:05 yzhwang

ImageBind ImageBind copied to clipboard

Text/Audio/Image > Video/3D

ImageBind
ImageBind copied to clipboard