ImageBind icon indicating copy to clipboard operation
ImageBind copied to clipboard

[Help] How can I generate images or audio?

Open burakmartin opened this issue 1 year ago • 4 comments

Hey, could someone explain me (no AI/ML background) on how this model could be used to generate images or audio? I can generate 3 x 3 tensors in code, no problem, but what's the next step to leverage these tensors?

I'm pretty sure I'm not the only one who will stand here and think to himself: "what now?" I would appreciate a hint or anything that would explain how I could use these tensors without having to read the paper (which I tried but didn't really grasp).

burakmartin avatar May 12 '23 07:05 burakmartin

Same here, i just need some examples.

WilTay1 avatar May 13 '23 06:05 WilTay1

Yeah, I need them too :)

chjayakrishnajk avatar May 13 '23 14:05 chjayakrishnajk

Same. I am also interested in an example for the embedding space arithmetic showcased in Figure 4 of the paper where they retrieve an image using an image and audio.

bakachan19 avatar May 15 '23 13:05 bakachan19

You may find ViT-Lens of interests, which works with MLLM to generate texts or images from other modalities :)

StanLei52 avatar Jan 09 '24 05:01 StanLei52