The issue about Audio to Image Generation
An amazing work!!!
It's well known that https://github.com/lucidrains/DALLE2-pytorch and https://github.com/LAION-AI/dalle2-laion used open-clip as pretrianed text and image encoder. However, I have noticed that you used a private DALLE-2 to generate the image conditioned on audio.
Whether is it possible to use open source DALLE-2 instea of private reimplemented counterpart? Does it have some problems with open source DALLE-2? I would appreciate if you can share experience.
In my view, If it was possible to use open source DALLE-2 to adapt the ImageBind, it could directly create some very interesting applications and increase the impact of this work!
Can someone help me? Thanks!
We tried audio to image using Stable Diffusion. The project is open-sourced: https://github.com/sail-sg/BindDiffusion
We tried audio to image using Stable Diffusion. The project is open-sourced: https://github.com/sail-sg/BindDiffusion
Wow, great work, I have starred this repo!