diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Fix Issue #1197 since 2022] Support pre-trained openai/guided-diffusion (ADM) with minimal code change

Open tongdaxu opened this issue 1 year ago • 30 comments

What does this PR do?

TLDR: This PR includes openai/guided-diffusion pre-trained models into diffusers, with a diverse family of pre-trained openai/guided-diffusion models, including pre-trained model with ImageNet, LSUN bedroom, LSUN cat, LSUN horse and FFHQ dataset.

The openai/guided-diffusion pre-trained models (ADM unconditional 256x256) are used by academic community a lot (e.g. DPS ICLR 2023: https://github.com/DPS2022/diffusion-posterior-sampling, FreeDOM ICCV 2023: https://github.com/vvictoryuki/FreeDoM?tab=readme-ov-file). However, it is not supported in huggingface/diffusers.

This issue has been raised as early as 2022: https://github.com/huggingface/diffusers/issues/1197 but left unsolved. As it is indeed quite complicated.

I make changes to UNet2DModel as minimal as possible, to make it

  • backward compatiable with previous models
  • supports openai/guided-diffusion (ADM) so that we can play openai/guided-diffusion pretrained model as any standard unconditional diffusion model (https://huggingface.co/docs/diffusers/using-diffusers/unconditional_image_generation).

Those changes includes:

  • Interface of UNet2DModel: add new argument "attention_legacy_order"
  • Two necessary building blocks, including:
    • ADM's time_proj, theoretically they are the same as diffusers implementation, but numerically they are different. Replacing one with anothor breaks the model.
    • attention_legacy_order mode in class Attention: the legacy order mode is necessary. use diffusers attention directly breaks the model.

I have been very careful not to break any existing code, and make the new code as short as possible.

I have provided a script to convert pre-trained openai/guided-diffusion to huggingface compatible model, in https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_adm_to_diffusers.py.

I have also provide my conversion of models with configs. Those conversions have mean absolute error $~5e-5$, and relative absolute error $~6e-5$, when the input noise is the same. As the error is minimal, the model and conversion is correct. The complete list of converted models is:

  • ImageNet: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt -> https://huggingface.co/xutongda/adm_imagenet_256x256_unconditional
  • LSUN bedroom: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_bedroom.pt -> https://huggingface.co/xutongda/adm_lsun_bedroom_256x256
  • LSUN cat: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_cat.pt -> https://huggingface.co/xutongda/adm_lsun_cat_256x256
  • LSUN horse: https://openaipublic.blob.core.windows.net/diffusion/jul-2021/lsun_horse.pt -> https://huggingface.co/xutongda/adm_lsun_horse_256x256
  • FFHQ: https://drive.google.com/drive/folders/1jElnRoFv7b31fG0v6pTSQkelbSX3xGZh -> https://huggingface.co/xutongda/adm_ffhq_256x256/tree/main

Now we can sample from the pre-trained models of openai/guided-diffusion, using diffusers in an out of box way.

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

And the result is as good as the original openai/guided-diffusion model:

  • sample in diffusers with converted model sample
  • sample in openai/guided-diffusion with original model sample_adm

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline?
  • [x] Did you read our philosophy doc (important for complex PRs)?
  • [x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • [x] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [x] Did you write any new necessary tests?

Who can review?

@patrickvonplaten @yiyixuxu and @sayakpaul

tongdaxu avatar Jan 27 '24 13:01 tongdaxu

Some other samples using the converted model with diffusers:

  • Samples from LSUN cat model
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_cat_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_cat

  • Samples from FFHQ model
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_ffhq_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_ffhq

  • Samples from LSUN horse model
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_horse_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_horse

  • Samples from LSUN bedroom model
from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_lsun_bedroom_256x256").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

generated_image_bedroom

tongdaxu avatar Jan 28 '24 07:01 tongdaxu

Thanks very much for your work on this.

I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature.

So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible.

@patrickvonplaten @yiyixuxu WDYT here?

sayakpaul avatar Jan 28 '24 11:01 sayakpaul

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Thanks very much for your work on this.

I agree that ADM is still very much used by the academic community but probably doesn't have a lot of real-world significance because of the lower quality. On the other hand, we do support Consistency Models as well as the original DDPM and DDIM models to respect the literature.

So, given the above point and also considering the minimal changes introduced in this PR, I'd be supportive of adding it. My only major feedback would be to try to not use legacy attention blocks if possible.

@patrickvonplaten @yiyixuxu WDYT here?

The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them. I have tried to use diffuser attention but the model produces garbage images like (suppose to be bedroom):

generated_image_bedroom_bad

tongdaxu avatar Jan 28 '24 12:01 tongdaxu

The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them.

Can we try to maybe find a way to port the legacy attention to the one that's used now?

sayakpaul avatar Jan 28 '24 12:01 sayakpaul

The problem here is that all the offical pre-trained ADM by openai use legacy attention, so I really have no choice but using them.

Can we try to maybe find a way to port the legacy attention to the one that's used now?

Sorry, I did not quite get what you mean by "port". Did you mean to create a separate class like legacy attention, and use arguement like attention_type?

tongdaxu avatar Jan 28 '24 12:01 tongdaxu

See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192

sayakpaul avatar Jan 28 '24 13:01 sayakpaul

See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192

In fact, the part you refer to is about model conversion only, and I have already done it by calling the code of https://github.com/tongdaxu/diffusers/blob/main/scripts/convert_consistency_to_diffusers.py#L143. In this way, we can indeed unify the model weights of openai/guideddiffusion and diffusers. However, it has nothing to do with legacy / non legacy attention order. It is purely about the parameterization of linear layers.

However, what can not be avoided is the run time difference between legacy and non legacy attention. The "qkv, q, k, v" you are referring to are model weights, the "qkv, q, k, v" I am referring to are activation tensors. They are different stuffs with different shape.

tongdaxu avatar Jan 28 '24 13:01 tongdaxu

See my comment here https://github.com/huggingface/diffusers/pull/6730/files#r1468858192

In openai/guideddiffusion, both normal attention and legacy attention are implemented in separate class:

  • normal: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/unet.py#L361
  • legacy: https://github.com/openai/guided-diffusion/blob/main/guided_diffusion/unet.py#L328

Those two attentions have exactly the same model weights, and the two classes have no parameters at all. So it has nothing to do with clever tricks in model conversion. This has to be solved in runtime.

tongdaxu avatar Jan 28 '24 13:01 tongdaxu

Hi @sayakpaul, any further comments?

tongdaxu avatar Jan 30 '24 01:01 tongdaxu

Hi, I have found a way to avoid breaking the possible backward deps in class UNetMidBlock2D and updated the PR. The change is still minimal but it does not break anything.

I would love your advices @yiyixuxu @sayakpaul.

tongdaxu avatar Feb 03 '24 10:02 tongdaxu

Let's add this model to the research_projects folder no? It's a bit too outdated to be in core diffusers I'd say (cc @yiyixuxu)

patrickvonplaten avatar Feb 09 '24 16:02 patrickvonplaten

@tongdaxu can we move this to the research folder?

yiyixuxu avatar Feb 10 '24 07:02 yiyixuxu

@tongdaxu can we move this to the research folder?

I am fine with that, what should I do to move this to research folder?

tongdaxu avatar Feb 11 '24 12:02 tongdaxu

I am fine with that, what should I do to move this to research folder?

You can follow the structure of https://github.com/huggingface/diffusers/tree/main/examples/research_projects/controlnetxs as an example. Here's what you could consider.

Have all the conversion script, modeling and pipeline files under a folder and make sure they work.

sayakpaul avatar Feb 11 '24 14:02 sayakpaul

Hi @tongdaxu, thank you for your great work!

I am having trouble generating nice images with your PR. I hope you can help :)

I installed the PR as follows:

  1. conda create -n newenv python=3.9
  2. conda activate newenv
  3. pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
  4. pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Then i ran:

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

My generated images look like this:

generated_image

Did I install the PR wrong or is there a bug?

kschwethelm avatar Feb 13 '24 15:02 kschwethelm

Hi @tongdaxu, thank you for your great work!

I am having trouble generating nice images with your PR. I hope you can help :)

I installed the PR as follows:

  1. conda create -n newenv python=3.9
  2. conda activate newenv
  3. pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
  4. pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Then i ran:

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

My generated images look like this:

generated_image

Did I install the PR wrong or is there a bug?

I am out of office until 17 Feb, would you like to try other models first? And the hugging face model hub has been updated since the middle of commit. Are you using the latest model?

tongdaxu avatar Feb 14 '24 08:02 tongdaxu

Hi @tongdaxu, thank you for your great work!

I am having trouble generating nice images with your PR. I hope you can help :)

I installed the PR as follows:

  1. conda create -n newenv python=3.9
  2. conda activate newenv
  3. pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu113
  4. pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Then i ran:

from diffusers import DiffusionPipeline

generator = DiffusionPipeline.from_pretrained("xutongda/adm_imagenet_256x256_unconditional").to("cuda")
image = generator().images[0]
image.save("generated_image.png")

My generated images look like this:

generated_image

Did I install the PR wrong or is there a bug?

Sorry I do not have access to GPU for now. But the instructions in https://github.com/tongdaxu/InverseDiffusion should work. Would you like to give it a try?

tongdaxu avatar Feb 14 '24 08:02 tongdaxu

Thank you for your quick response. Sadly, your instructions did not work either.

I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:

generated_image

kschwethelm avatar Feb 14 '24 11:02 kschwethelm

Thank you for your quick response. Sadly, your instructions did not work either.

I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:

generated_image

I believe it is a bug on my side. It could be the last force push. I will fix it ASAP.

tongdaxu avatar Feb 14 '24 14:02 tongdaxu

Thank you for your quick response. Sadly, your instructions did not work either.

I tried all versions of your repository and different pretrained models, but I still get bad results, e.g., FFHQ:

generated_image

Hi, I just ran a small test with https://github.com/tongdaxu/diffusers/commit/111eac139f3dc6ff47c50810a35044cad9b323b1. It seems to be ok.

捕获

I am not sure what is happening here. And I might need more time figuring it out when I am back to office after 17 Feb.

I can only run some small sanity check for now. And all I can say is that with the commit above and the imagenet model (I check the hash sum), the sampling should be fine. I am not sure if there can be some dependency problem (I am using torch 2.1.0). I might need to go back to office for more testing.

Thanks for pointing it out and for your patience.

tongdaxu avatar Feb 14 '24 14:02 tongdaxu

Hi, thank you very much for your help! The problem was actually the PyTorch version. I now tried with torch-2.2.0 and it works fine.

generated_image

kschwethelm avatar Feb 14 '24 15:02 kschwethelm

4. pip install git+https://github.com/huggingface/diffusers.git@refs/pull/6730/head

Thank you @kschwethelm, that is very strange. I also find that it fails with torch 1.9 and works with torch 2.1.

I do not have a clue about why it fails. Have you figured out why?

tongdaxu avatar Feb 14 '24 15:02 tongdaxu

Hi, thank you very much for your help! The problem was actually the PyTorch version. I now tried with torch-2.2.0 and it works fine.

generated_image

I can't remember I have added any torch version sensitive code.

tongdaxu avatar Feb 14 '24 15:02 tongdaxu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 10 '24 15:03 github-actions[bot]

Not stale. @yiyixuxu WDYT?

sayakpaul avatar Mar 10 '24 15:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 05 '24 15:04 github-actions[bot]

how to finetune with the adm pretrained model

jiangyuhangcn avatar Apr 26 '24 05:04 jiangyuhangcn