diffusers
diffusers copied to clipboard
[Fix Issue #1197 since 2022] Support pre-trained openai/guided-diffusion (ADM) with minimal code change
What does this PR do?
TLDR: This PR includes openai/guided-diffusion pre-trained models into diffusers, with a diverse family of pre-trained openai/guided-diffusion models, including pre-trained model with ImageNet, LSUN bedroom, LSUN cat, LSUN horse and FFHQ dataset.
The openai/guided-diffusion pre-trained models (ADM unconditional 256x256) are used by academic community a lot (e.g. DPS ICLR 2023: https://github.com/DPS2022/diffusion-posterior-sampling, FreeDOM ICCV 2023: https://github.com/vvictoryuki/FreeDoM?tab=readme-ov-file). However, it is not supported in huggingface/diffusers.
This issue has been raised as early as 2022: https://github.com/huggingface/diffusers/issues/1197 but left unsolved. As it is indeed quite complicated.
I make changes to UNet2DModel as minimal as possible, to make it
- backward compatiable with previous models
- supports openai/guided-diffusion (ADM) so that we can play openai/guided-diffusion pretrained model as any standard unconditional diffusion model (https://huggingface.co/docs/diffusers/using-diffusers/unconditional_image_generation).
Those changes includes:
- Interface of UNet2DModel: add new argument "attention_legacy_order"
- Two necessary building blocks, including:
- ADM's time_proj, theoretically they are the same as diffusers implementation, but numerically they are different. Replacing one with anothor breaks the model.
- attention_legacy_order mode in class Attention: the legacy order mode is necessary. use diffusers attention directly breaks the model.
I have been very careful not to break any existing code, and make the new code as short as possible.
I have provided a script to convert pre-trained openai/guided-diffusion to huggingface compatible model, in https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_adm_to_diffusers.py.
I have also provide my conversion of models with configs. Those conversions have mean absolute error $~5e-5$, and relative absolute error $~6e-5$, when the input noise is the same. As the error is minimal, the model and conversion is correct. The complete list of converted models is:
- ImageNet: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt -> https://huggingface.co/xutongda/adm_imagenet_256x256_unconditional
- LSUN bedroom: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_bedroom.pt -> https://huggingface.co/xutongda/adm_lsun_bedroom_256x256
- LSUN cat: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_cat.pt -> https://huggingface.co/xutongda/adm_lsun_cat_256x256
- LSUN horse: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_horse.pt -> https://huggingface.co/xutongda/adm_lsun_horse_256x256
- FFHQ: https://drive.google.com/drive/folders/1jElnRoFv7b31fG0v6pTSQkelbSX3xGZh -> https://huggingface.co/xutongda/adm_ffhq_256x256/tree/main
Now we can sample from the pre-trained models of openai/guided-diffusion, using diffusers in an out of box way.
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
And the result is as good as the original openai/guided-diffusion model:
- sample in diffusers with converted model
- sample in openai/guided-diffusion with original model
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline?
- [x] Did you read our philosophy doc (important for complex PRs)?
- [x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [x] Did you write any new necessary tests?
Who can review?
@patrickvonplaten @yiyixuxu and @sayakpaul
Some other samples using the converted model with diffusers:
- Samples from LSUN cat model
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_cat_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
- Samples from FFHQ model
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_ffhq_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
- Samples from LSUN horse model
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_horse_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
- Samples from LSUN bedroom model
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_bedroom_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
Thanks very much for your work on this.
I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature.
So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible.
@patrickvonplaten @yiyixuxu WDYT here?
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
Thanks very much for your work on this.
I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature.
So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible.
@patrickvonplaten @yiyixuxu WDYT here?
The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them. I have tried to use diffuser attention but the model produces garbage images like (suppose to be bedroom):
The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them.
Can we try to maybe find a way to port the legacy attention to the one that's used now?
The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them.
Can we try to maybe find a way to port the legacy attention to the one that's used now?
Sorry, I did not quite get what you mean by "port". Did you mean to create a separate class like legacy attention, and use arguement like attention_type?
See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192
See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192
In fact, the part you refer to is about model conversion only, and I have already done it by calling the code of https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_consistency_to_diffusers.py#L143. In this way, we can indeed unify the model weights of openai/guideddiffusion and diffusers. However, it has nothing to do with legacy / non legacy attention order. It is purely about the parameterization of linear layers.
However, what can not be avoided is the run time difference between legacy and non legacy attention. The "qkv, q, k, v" you are referring to are model weights, the "qkv, q, k, v" I am referring to are activation tensors. They are different stuffs with different shape.
See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192
In openai/guideddiffusion, both normal attention and legacy attention are implemented in separate class:
- normal: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/unet.py#L361
- legacy: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/unet.py#L328
Those two attentions have exactly the same model weights, and the two classes have no parameters at all. So it has nothing to do with clever tricks in model conversion. This has to be solved in runtime.
Hi @sayakpaul, any further comments?
Hi, I have found a way to avoid breaking the possible backward deps in class UNetMidBlock2D and updated the PR. The change is still minimal but it does not break anything.
I would love your advices @yiyixuxu @sayakpaul.
Let's add this model to the research_projects folder no? It's a bit too outdated to be in core diffusers I'd say (cc @yiyixuxu)
@tongdaxu can we move this to the research folder?
@tongdaxu can we move this to the research folder?
I am fine with that, what should I do to move this to research folder?
I am fine with that, what should I do to move this to research folder?
You can follow the structure of https://github.com/huggingface/diffusers/tree/main/examples/research_projects/controlnetxs as an example. Here's what you could consider.
Have all the conversion script, modeling and pipeline files under a folder and make sure they work.
Hi @tongdaxu, thank you for your great work!
I am having trouble generating nice images with your PR. I hope you can help :)
I installed the PR as follows:
-
conda create -n newenv python=3.9
-
conda activate newenv
-
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
-
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head
Then i ran:
from diffusers import DiffusionPipeline
generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
My generated images look like this:
Did I install the PR wrong or is there a bug?
Hi @tongdaxu, thank you for your great work!
I am having trouble generating nice images with your PR. I hope you can help :)
I installed the PR as follows:
conda create -n newenv python=3.9
conda activate newenv
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head
Then i ran:
from diffusers import DiffusionPipeline generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda") image = generator().images[0] image.save("generated_image.png")
My generated images look like this:
Did I install the PR wrong or is there a bug?
I am out of office until 17 Feb, would you like to try other models first? And the hugging face model hub has been updated since the middle of commit. Are you using the latest model?
Hi @tongdaxu, thank you for your great work!
I am having trouble generating nice images with your PR. I hope you can help :)
I installed the PR as follows:
conda create -n newenv python=3.9
conda activate newenv
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head
Then i ran:
from diffusers import DiffusionPipeline generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda") image = generator().images[0] image.save("generated_image.png")
My generated images look like this:
Did I install the PR wrong or is there a bug?
Sorry I do not have access to GPU for now. But the instructions in https://github.com/tongdaxu/InverseDiffusion should work. Would you like to give it a try?
Thank you for your quick response. Sadly, your instructions did not work either.
I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:
Thank you for your quick response. Sadly, your instructions did not work either.
I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:
I believe it is a bug on my side. It could be the last force push. I will fix it ASAP.
Thank you for your quick response. Sadly, your instructions did not work either.
I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:
Hi, I just ran a small test with https://github.com/tongdaxu/diffusers/commit/111eac139f3dc6ff47c50810a35044cad9b323b1. It seems to be ok.
I am not sure what is happening here. And I might need more time figuring it out when I am back to office after 17 Feb.
I can only run some small sanity check for now. And all I can say is that with the commit above and the imagenet model (I check the hash sum), the sampling should be fine. I am not sure if there can be some dependency problem (I am using torch 2.1.0). I might need to go back to office for more testing.
Thanks for pointing it out and for your patience.
Hi, thank you very much for your help! The problem was actually the PyTorch version. I now tried with torch-2.2.0 and it works fine.
4. pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head
Thank you @kschwethelm, that is very strange. I also find that it fails with torch 1.9 and works with torch 2.1.
I do not have a clue about why it fails. Have you figured out why?
Hi, thank you very much for your help! The problem was actually the PyTorch version. I now tried with torch-2.2.0 and it works fine.
I can't remember I have added any torch version sensitive code.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Not stale. @yiyixuxu WDYT?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
how to finetune with the adm pretrained model