attention-interpolation-diffusion
attention-interpolation-diffusion copied to clipboard
Interpolation Between Text-to-Image Generation!
PAID: (Prompt-guided) Attention Interpolation of Text-to-Image Diffusion
He Qiyuan1,
Wang Jinghao2,
Liu Ziwei2,
Angela Yao1,✉;
Computer Vision & Machine Learning Group, National University of Singapore 1
S-Lab, Nanyang Technological University 2
✉ Corresponding Author
📌 Release
[03/2024] Code and paper are publicly available.
📑 Abstract
TL;DR: AID (Attention Interpolation via Diffusion) is a training-free method that enables the text-to-image diffusion model to generate interpolation between different conditions with high consistency, smoothness and fidelity. Its variant, PAID, provides further control of the interpolation via prompt guidance.
▶️ PAID Results
🏍️ Google Colab
Directly try PAID with Stable Diffusion 2.1 or SDXL using Google's Free GPU!
🚗 Local Setup using Jupyter Notebook
- Clone the repository and install the requirements:
git clone https://github.com/QY-H00/attention-interpolation-diffusion.git
cd attention-interpolation-diffusion
pip install requirements.txt
- Go to
play.ipynb
or play_sdxl.ipynb
for fun!
🛳️ Local Setup using Gradio
- install Gradio
pip install gradio
- Launch the Gradio interface
gradio gradio_src/app.py
🎲 Customized Interpolation
Our method offers users customized and diverse configurations to experiment with, allowing them to freely adjust settings and achieve a wide range of interesting interpolation results. Here are some examples:
Prompt guidance
1. "A dog driving car"
2. "A car with dog furry texture"
3. "A toy named dog-car"
4. "A painting of car and dog drawn by Vincent van Gogh"
$\alpha$ and $\beta$ of the Beta prior
1. $\alpha=1, \beta=1$
2. $\alpha=1, \beta=8$
3. $\alpha=8, \beta=1$
📝 Supporting Models
Model Name
Link
Stable Diffusion 1.4-512
CompVis/stable-diffusion-v1-4
Stable Diffusion 1.5-512
runwayml/stable-diffusion-v1-5
Stable Diffusion 2.1-768
stabilityai/stable-diffusion-2-1
Stable Diffusion XL-1024
stabilityai/stable-diffusion-xl-base-1.0
Animagine XL 3.1
cagliostrolab/animagine-xl-3.1
✒️Citation
If you found this repository/our paper useful, please consider citing:
@misc{he2024aid,
title={AID: Attention Interpolation of Text-to-Image Diffusion},
author={Qiyuan He and Jinghao Wang and Ziwei Liu and Angela Yao},
year={2024},
eprint={2403.17924},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
❤️ Acknowledgement
We thank the following repositories for their great work: diffusers, transformers.
➕️ More Results with SD1.5
Realist Style
Pikachu -> Gundam
Computer -> Phone
Anime Style
Ninja -> Cat
Ninja -> Dog
Oil-Painting Style
Starry night -> Mona Lisas
SkyCraper -> Town
He Qiyuan1,
Wang Jinghao2,
Liu Ziwei2,
Angela Yao1,✉;
Computer Vision & Machine Learning Group, National University of Singapore 1
S-Lab, Nanyang Technological University 2
✉ Corresponding Author
📌 Release
[03/2024] Code and paper are publicly available.
📑 Abstract
TL;DR: AID (Attention Interpolation via Diffusion) is a training-free method that enables the text-to-image diffusion model to generate interpolation between different conditions with high consistency, smoothness and fidelity. Its variant, PAID, provides further control of the interpolation via prompt guidance.
▶️ PAID Results
🏍️ Google Colab
Directly try PAID with Stable Diffusion 2.1 or SDXL using Google's Free GPU!
🚗 Local Setup using Jupyter Notebook
- Clone the repository and install the requirements:
git clone https://github.com/QY-H00/attention-interpolation-diffusion.git
cd attention-interpolation-diffusion
pip install requirements.txt
- Go to
play.ipynb
orplay_sdxl.ipynb
for fun!
🛳️ Local Setup using Gradio
- install Gradio
pip install gradio
- Launch the Gradio interface
gradio gradio_src/app.py
🎲 Customized Interpolation
Our method offers users customized and diverse configurations to experiment with, allowing them to freely adjust settings and achieve a wide range of interesting interpolation results. Here are some examples:
Prompt guidance
1. "A dog driving car"
2. "A car with dog furry texture"
3. "A toy named dog-car"
4. "A painting of car and dog drawn by Vincent van Gogh"
$\alpha$ and $\beta$ of the Beta prior
1. $\alpha=1, \beta=1$
2. $\alpha=1, \beta=8$
3. $\alpha=8, \beta=1$
📝 Supporting Models
Model Name | Link |
---|---|
Stable Diffusion 1.4-512 | CompVis/stable-diffusion-v1-4 |
Stable Diffusion 1.5-512 | runwayml/stable-diffusion-v1-5 |
Stable Diffusion 2.1-768 | stabilityai/stable-diffusion-2-1 |
Stable Diffusion XL-1024 | stabilityai/stable-diffusion-xl-base-1.0 |
Animagine XL 3.1 | cagliostrolab/animagine-xl-3.1 |
✒️Citation
If you found this repository/our paper useful, please consider citing:
@misc{he2024aid,
title={AID: Attention Interpolation of Text-to-Image Diffusion},
author={Qiyuan He and Jinghao Wang and Ziwei Liu and Angela Yao},
year={2024},
eprint={2403.17924},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
❤️ Acknowledgement
We thank the following repositories for their great work: diffusers, transformers.
➕️ More Results with SD1.5
Realist Style
Pikachu -> Gundam
Computer -> Phone
Anime Style
Ninja -> Cat
Ninja -> Dog
Oil-Painting Style
Starry night -> Mona Lisas
SkyCraper -> Town