What does this PR do?

TLDR: This PR includes openai/guided-diffusion pre-trained models into diffusers, with a diverse family of pre-trained openai/guided-diffusion models, including pre-trained model with ImageNet, LSUN bedroom, LSUN cat, LSUN horse and FFHQ dataset.

The openai/guided-diffusion pre-trained models (ADM unconditional 256x256) are used by academic community a lot (e.g. DPS ICLR 2023: https://github.com/DPS2022/diffusion-posterior-sampling, FreeDOM ICCV 2023: https://github.com/vvictoryuki/FreeDoM?tab=readme-ov-file). However, it is not supported in huggingface/diffusers.

This issue has been raised as early as 2022: https://github.com/huggingface/diffusers/issues/1197 but left unsolved. As it is indeed quite complicated.

I make changes to UNet2DModel as minimal as possible, to make it

backward compatiable with previous models
supports openai/guided-diffusion (ADM) so that we can play openai/guided-diffusion pretrained model as any standard unconditional diffusion model (https://huggingface.co/docs/diffusers/using-diffusers/unconditional_image_generation).

Those changes includes:

Interface of UNet2DModel: add new argument "attention_legacy_order"
Two necessary building blocks, including:
- ADM's time_proj, theoretically they are the same as diffusers implementation, but numerically they are different. Replacing one with anothor breaks the model.
- attention_legacy_order mode in class Attention: the legacy order mode is necessary. use diffusers attention directly breaks the model.

I have been very careful not to break any existing code, and make the new code as short as possible.

I have provided a script to convert pre-trained openai/guided-diffusion to huggingface compatible model, in https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_adm_to_diffusers.py.

I have also provide my conversion of models with configs. Those conversions have mean absolute error $~5e-5$, and relative absolute error $~6e-5$, when the input noise is the same. As the error is minimal, the model and conversion is correct. The complete list of converted models is:

ImageNet: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt -> https://huggingface.co/xutongda/adm_imagenet_256x256_unconditional
LSUN bedroom: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_bedroom.pt -> https://huggingface.co/xutongda/adm_lsun_bedroom_256x256
LSUN cat: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_cat.pt -> https://huggingface.co/xutongda/adm_lsun_cat_256x256
LSUN horse: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_horse.pt -> https://huggingface.co/xutongda/adm_lsun_horse_256x256
FFHQ: https://drive.google.com/drive/folders/1jElnRoFv7b31fG0v6pTSQkelbSX3xGZh -> https://huggingface.co/xutongda/adm_ffhq_256x256/tree/main

Now we can sample from the pre-trained models of openai/guided-diffusion, using diffusers in an out of box way.

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

And the result is as good as the original openai/guided-diffusion model:

sample in diffusers with converted model
sample in openai/guided-diffusion with original model

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline?
[x] Did you read our philosophy doc (important for complex PRs)?
[x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[x] Did you write any new necessary tests?

Who can review?

@patrickvonplaten @yiyixuxu and @sayakpaul

Jan 27 '24 13:01 tongdaxu

Some other samples using the converted model with diffusers:

Samples from LSUN cat model

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_cat_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_cat

Samples from FFHQ model

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_ffhq_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_ffhq

Samples from LSUN horse model

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_horse_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_horse

Samples from LSUN bedroom model

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_bedroom_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_bedroom

Jan 28 '24 07:01 tongdaxu

Thanks very much for your work on this.

I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature.

So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible.

@patrickvonplaten @yiyixuxu WDYT here?

Jan 28 '24 11:01 sayakpaul

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Jan 28 '24 11:01 HuggingFaceDocBuilderDev

Thanks very much for your work on this.

I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature.

So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible.

@patrickvonplaten @yiyixuxu WDYT here?

The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them. I have tried to use diffuser attention but the model produces garbage images like (suppose to be bedroom):

generated_image_bedroom_bad

Jan 28 '24 12:01 tongdaxu

The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them.

Can we try to maybe find a way to port the legacy attention to the one that's used now?

Jan 28 '24 12:01 sayakpaul

The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them.

Can we try to maybe find a way to port the legacy attention to the one that's used now?

Sorry, I did not quite get what you mean by "port". Did you mean to create a separate class like legacy attention, and use arguement like attention_type?

Jan 28 '24 12:01 tongdaxu

See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192

Jan 28 '24 13:01 sayakpaul

See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192

In fact, the part you refer to is about model conversion only, and I have already done it by calling the code of https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_consistency_to_diffusers.py#L143. In this way, we can indeed unify the model weights of openai/guideddiffusion and diffusers. However, it has nothing to do with legacy / non legacy attention order. It is purely about the parameterization of linear layers.

However, what can not be avoided is the run time difference between legacy and non legacy attention. The "qkv, q, k, v" you are referring to are model weights, the "qkv, q, k, v" I am referring to are activation tensors. They are different stuffs with different shape.

Jan 28 '24 13:01 tongdaxu

See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192

In openai/guideddiffusion, both normal attention and legacy attention are implemented in separate class:

normal: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/unet.py#L361
legacy: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/unet.py#L328

Those two attentions have exactly the same model weights, and the two classes have no parameters at all. So it has nothing to do with clever tricks in model conversion. This has to be solved in runtime.

Jan 28 '24 13:01 tongdaxu

Hi @sayakpaul, any further comments?

Jan 30 '24 01:01 tongdaxu

Hi, I have found a way to avoid breaking the possible backward deps in class UNetMidBlock2D and updated the PR. The change is still minimal but it does not break anything.

I would love your advices @yiyixuxu @sayakpaul.

Feb 03 '24 10:02 tongdaxu

Let's add this model to the research_projects folder no? It's a bit too outdated to be in core diffusers I'd say (cc @yiyixuxu)

Feb 09 '24 16:02 patrickvonplaten

@tongdaxu can we move this to the research folder?

Feb 10 '24 07:02 yiyixuxu

@tongdaxu can we move this to the research folder?

I am fine with that, what should I do to move this to research folder?

Feb 11 '24 12:02 tongdaxu

I am fine with that, what should I do to move this to research folder?

You can follow the structure of https://github.com/huggingface/diffusers/tree/main/examples/research_projects/controlnetxs as an example. Here's what you could consider.

Have all the conversion script, modeling and pipeline files under a folder and make sure they work.

Feb 11 '24 14:02 sayakpaul

Hi @tongdaxu, thank you for your great work!

I am having trouble generating nice images with your PR. I hope you can help :)

I installed the PR as follows:

conda create -n newenv python=3.9
conda activate newenv
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Then i ran:

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

My generated images look like this:

generated_image

Did I install the PR wrong or is there a bug?

Feb 13 '24 15:02 kschwethelm

Hi @tongdaxu, thank you for your great work!

I am having trouble generating nice images with your PR. I hope you can help :)

I installed the PR as follows:

conda create -n newenv python=3.9

conda activate newenv

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Then i ran:
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
My generated images look like this:

Did I install the PR wrong or is there a bug?

I am out of office until 17 Feb, would you like to try other models first? And the hugging face model hub has been updated since the middle of commit. Are you using the latest model?

Feb 14 '24 08:02 tongdaxu

Hi @tongdaxu, thank you for your great work!

I am having trouble generating nice images with your PR. I hope you can help :)

I installed the PR as follows:

conda create -n newenv python=3.9

conda activate newenv

pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113

pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Then i ran:
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")
My generated images look like this:

Did I install the PR wrong or is there a bug?

Sorry I do not have access to GPU for now. But the instructions in https://github.com/tongdaxu/InverseDiffusion should work. Would you like to give it a try?

Feb 14 '24 08:02 tongdaxu

Thank you for your quick response. Sadly, your instructions did not work either.

I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:

generated_image

Feb 14 '24 11:02 kschwethelm

Thank you for your quick response. Sadly, your instructions did not work either.

I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:

I believe it is a bug on my side. It could be the last force push. I will fix it ASAP.

Feb 14 '24 14:02 tongdaxu

Thank you for your quick response. Sadly, your instructions did not work either.

I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:

Hi, I just ran a small test with https://github.com/tongdaxu/diffusers/commit/111eac139f3dc6ff47c50810a35044cad9b323b1. It seems to be ok.

I am not sure what is happening here. And I might need more time figuring it out when I am back to office after 17 Feb.

I can only run some small sanity check for now. And all I can say is that with the commit above and the imagenet model (I check the hash sum), the sampling should be fine. I am not sure if there can be some dependency problem (I am using torch 2.1.0). I might need to go back to office for more testing.

Thanks for pointing it out and for your patience.

Feb 14 '24 14:02 tongdaxu

Hi, thank you very much for your help! The problem was actually the PyTorch version. I now tried with torch-2.2.0 and it works fine.

generated_image

Feb 14 '24 15:02 kschwethelm

4. pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Thank you @kschwethelm, that is very strange. I also find that it fails with torch 1.9 and works with torch 2.1.

I do not have a clue about why it fails. Have you figured out why?

Feb 14 '24 15:02 tongdaxu

Hi, thank you very much for your help! The problem was actually the PyTorch version. I now tried with torch-2.2.0 and it works fine.

I can't remember I have added any torch version sensitive code.

Feb 14 '24 15:02 tongdaxu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Mar 10 '24 15:03 github-actions[bot]

Not stale. @yiyixuxu WDYT?

Mar 10 '24 15:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Apr 05 '24 15:04 github-actions[bot]

how to finetune with the adm pretrained model

Apr 26 '24 05:04 jiangyuhangcn

diffusers
diffusers copied to clipboard

[Fix Issue #1197 since 2022] Support pre-trained openai/guided-diffusion (ADM) with minimal code change

What does this PR do?

Before submitting

Who can review?

diffusers diffusers copied to clipboard

[Fix Issue #1197 since 2022] Support pre-trained openai/guided-diffusion (ADM) with minimal code change

What does this PR do?

Before submitting

Who can review?

diffusers
diffusers copied to clipboard