ImageBind [Help] How can I generate images or audio?

[Help] How can I generate images or audio?

Open burakmartin opened this issue 1 year ago • 4 comments

Hey, could someone explain me (no AI/ML background) on how this model could be used to generate images or audio? I can generate 3 x 3 tensors in code, no problem, but what's the next step to leverage these tensors?

I'm pretty sure I'm not the only one who will stand here and think to himself: "what now?" I would appreciate a hint or anything that would explain how I could use these tensors without having to read the paper (which I tried but didn't really grasp).

May 12 '23 07:05 burakmartin

Same here, i just need some examples.

May 13 '23 06:05 WilTay1

Yeah, I need them too :)

May 13 '23 14:05 chjayakrishnajk

Same. I am also interested in an example for the embedding space arithmetic showcased in Figure 4 of the paper where they retrieve an image using an image and audio.

May 15 '23 13:05 bakachan19

You may find ViT-Lens of interests, which works with MLLM to generate texts or images from other modalities :)

Jan 09 '24 05:01 StanLei52

ImageBind ImageBind copied to clipboard

[Help] How can I generate images or audio?

ImageBind
ImageBind copied to clipboard