Come on, come on, let's adapt the conversion script to SD 2.0

Open piEsposito opened this issue 1 year ago • 55 comments

Is your feature request related to a problem? Please describe. It would be great if we could run SD 2 with cpu_offload, attention slicing, xformers, etc...

Describe the solution you'd like Adapt the conversion script to SD 2.0

Describe alternatives you've considered Stability AI's repo is not as flexible.

Nov 24 '22 04:11 piEsposito

🤗 Diffusers with Stable Diffusion 2 is live!

anton-l commented (https://github.com/huggingface/diffusers/issues/1388#issuecomment-1327731012) diffusers==0.9.0 with Stable Diffusion 2 are live!

Installation pip install diffusers[torch]==0.9 transformers

Release Information https://github.com/huggingface/diffusers/releases/tag/v0.9.0

Contributors @kashif @pcuenca @patrickvonplaten @anton-l @patil-suraj

📰 News

Weights for the 768x768 model is up! - Averad
- https://huggingface.co/stabilityai/stable-diffusion-2
#1386 - by patil-suraj and patrickvonplaten has been merged into --main by patil-suraj
Weights for the 512x512 models are up! - patrickvonplaten
- https://huggingface.co/stabilityai/stable-diffusion-2-base
- https://huggingface.co/stabilityai/stable-diffusion-2-inpainting

✏️ Notes & Information

Related huggingface/diffusers Pull Requests:

#1424
#1421
#1420
#1419
#1416
#1415
#1414
#1412
#1411
#1410
#1402
#1400
#1397
#1396
#1386 - by patil-suraj and patrickvonplaten

👇 Quick Links:

https://github.com/Stability-AI/stablediffusion
https://huggingface.co/stabilityai/stable-diffusion-2
https://huggingface.co/stabilityai/stable-diffusion-2-base
https://huggingface.co/stabilityai/stable-diffusion-2-depth
https://huggingface.co/stabilityai/stable-diffusion-2-inpainting
https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler

👁️ User Submitted Resources:

Test Version of convert_original_stable_diffusion_to_diffusers.py by hafriedlander
Example of accessing the penultimate text embedding layer by hafriedlander
NovelAI Improvements on Stable Diffusion

💭 User Story (Prior to Huggingface Diffusers 0.9.0 Release)

Stability-AI has released Stable Diffusion 2.0 models/workflow. When you run convert_original_stable_diffusion_to_diffusers.py on the new Stability-AI/stablediffusion models the following errors occur.

convert_original_stable_diffusion_to_diffusers.py --checkpoint_path="./512-inpainting-ema.ckpt" --dump_path="./512-inpainting-ema_diffusers"

Output:

Traceback (most recent call last):
File "convert_original_stable_diffusion_to_diffusers.py", line 720, in <module> 
        unet.load_state_dict(converted_unet_checkpoint)
File "lib\site-packages\torch\nn\modules\module.py", line 1667, in load_state_dict
        raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
        size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]).
        size mismatch for down_blocks.0.attentions.0.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
.... blocks.1.attentions blocks.2.attentions etc. etc.

Nov 24 '22 04:11 averad

trying to but likely I won't be able to do it lol

Nov 24 '22 04:11 devilismyfriend

Semi-Related:

https://github.com/Stability-AI/stablediffusion/issues/4
https://github.com/Stability-AI/stablediffusion/issues/9
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5009 [dupe]
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5011
https://github.com/huggingface/diffusers/issues/1392
https://github.com/huggingface/diffusers/issues/1398
https://github.com/huggingface/diffusers/issues/1404
https://github.com/db0/AI-Horde/issues/86
https://github.com/JoePenna/Dreambooth-Stable-Diffusion/issues/112
https://github.com/TheLastBen/fast-stable-diffusion/issues/599
https://github.com/ShivamShrirao/diffusers/issues/143
https://github.com/Sygil-Dev/nataili/issues/67
https://github.com/Sygil-Dev/sygil-webui/issues/1686

Nov 24 '22 04:11 0xdevalias

after looking at it I'm not sure it has anything to do with the script, seems like the u2net on diffusers needs to have 4 dimensions on the tensor size.

Nov 24 '22 04:11 devilismyfriend

needs to have 4 dimensions

So I guess this will take time...

Nov 24 '22 04:11 AugmentedRealityCat

needs to have 4 dimensions

So I guess this will take time...

maybe not, I'm not that knowledgeable on the subject but I assume a unet2D needs to be 4D, or maybe you can just artificially add it idk

Nov 24 '22 05:11 devilismyfriend

rudimentary support for stable diffusion 2.0

https://github.com/MrCheeze/stable-diffusion-webui/commit/069591b06bbbdb21624d489f3723b5f19468888d

Originally posted by @152334H in https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/5011#issuecomment-1325971596

Nov 24 '22 05:11 0xdevalias

https://github.com/hafriedlander/diffusers/blob/stable_diffusion_2/scripts/convert_original_stable_diffusion_to_diffusers.py

Notes:

Only tested on the two txt2img models, not inpaint / depth2img / upscaling
You will need to change your text embedding to use the penultimate layer too
It spits out a bunch of warnings about vision_model, but that's fine
I have no idea if this is right or not. It generates images, no guarantee beyond that. (Hence no PR - if you're patient, I'm sure the Diffusers team will do a better job than I have)

Nov 24 '22 08:11 hafriedlander

Here's an example of accessing the penultimate text embedding layer https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/b34bb27cf30940f6a6a41f4b77c5b77bea11fd76/sdgrpcserver/pipeline/text_embedding/basic_text_embedding.py#L33

Nov 24 '22 09:11 hafriedlander

https://github.com/hafriedlander/diffusers/blob/stable_diffusion_2/scripts/convert_original_stable_diffusion_to_diffusers.py

Notes:

Only tested on the two txt2img models, not inpaint / depth2img / upscaling

You will need to change your text embedding to use the penultimate layer too

It spits out a bunch of warnings about vision_model, but that's fine

I have no idea if this is right or not. It generates images, no guarantee beyond that. (Hence no PR - if you're patient, I'm sure the Diffusers team will do a better job than I have)

doesn't seem to work for me on the 768-v model using the v2 config for v

TypeError: EulerDiscreteScheduler.init() got an unexpected keyword argument 'prediction_type'

Nov 24 '22 10:11 devilismyfriend

Appears I'm also having unexpected argument error, but of a different arg:

Command:

python convert.py --checkpoint_path="models/512-base-ema.ckpt" --dump_path="outputs/" --original_config_file="v2-inference.yaml"

Result:

│ 736 │ unet = UNet2DConditionModel(**unet_config) │ │ 737 │ unet.load_state_dict(converted_unet_checkpoint)
TypeError: init() got an unexpected keyword argument 'use_linear_projection'

I can't seem to find a resolution to this one.

Nov 24 '22 10:11 CoffeeVampir3

You need to use the absolute latest Diffusers and merge this PR (or use my branch which has it in it) https://github.com/huggingface/diffusers/pull/1386

Nov 24 '22 10:11 hafriedlander

(My branch is at https://github.com/hafriedlander/diffusers/tree/stable_diffusion_2)

Nov 24 '22 10:11 hafriedlander

Amazing to see the excitement here! We'll merge #1386 in a bit :-)

Nov 24 '22 10:11 patrickvonplaten

@patrickvonplaten the problems I've run into so far:

attention_slicing doesn't work when attention_head_dim is a list (maybe you have a more elegant solution than that)
tokenizer.model_max_length is max_long when using my converter above, so I use text_encoder.config.max_position_embeddings instead

Nov 24 '22 10:11 hafriedlander

That's super helpful @hafriedlander - thanks!

BTW, weights for the 512x512 are up:

https://huggingface.co/stabilityai/stable-diffusion-2-base
https://huggingface.co/stabilityai/stable-diffusion-2-inpainting

Looking into the 768x768 model now

Nov 24 '22 10:11 patrickvonplaten

Nice. Do you have a solution in mind for how to flag to the pipeline to use the penultimate layer in the CLIP model? (I just pass it in as an option at the moment)

Nov 24 '22 10:11 hafriedlander

Can you send me a link? Does the pipeline not work out of the box? cc @anton-l @patil-suraj

Nov 24 '22 11:11 patrickvonplaten

It works but I don't think it's correct. The Stability configuration files explicitly say to use the penultimate CLIP layer https://github.com/Stability-AI/stablediffusion/blob/33910c386eaba78b7247ce84f313de0f2c314f61/configs/stable-diffusion/v2-inference-v.yaml#L68

Nov 24 '22 11:11 hafriedlander

It's relatively easy to get access to the penultimate layer. I do it in my custom pipeline like this:

https://github.com/hafriedlander/stable-diffusion-grpcserver/blob/b34bb27cf30940f6a6a41f4b77c5b77bea11fd76/sdgrpcserver/pipeline/text_embedding/basic_text_embedding.py#L33

The problem is knowing when to do it and when not to.

Nov 24 '22 11:11 hafriedlander

I see! Thanks for the links - so they do this for both the 512x512 SD 2 and 768x768 SD 2 model?

Nov 24 '22 11:11 patrickvonplaten

Both

Nov 24 '22 11:11 hafriedlander

It's a technique NovelAI discovered FYI (https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac)

Nov 24 '22 11:11 hafriedlander

Actually @patil-suraj solved it pretty cleanly by just removing the last layer: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/text_encoder/config.json#L19

So this works out of the box

Nov 24 '22 11:11 patrickvonplaten

Notice the difference between: https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/blob/main/text_encoder/config.json#L19 and https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/blob/main/config.json#L54

Nov 24 '22 11:11 patrickvonplaten

Ah, nice. Yeah, that's cleaner.

Nov 24 '22 11:11 hafriedlander

768x768 weights released:

https://huggingface.co/stabilityai/stable-diffusion-2/tree/main

fp16 and other versions of the models appear to being being worked on and uploaded.

Nov 24 '22 11:11 averad

testing in progress on the horde https://github.com/Sygil-Dev/nataili/tree/v2 try it out Stable Diffusion 2.0 on our UI's

https://tinybots.net/artbot https://aqualxx.github.io/stable-ui/ https://dbzer0.itch.io/lucid-creations

https://sigmoid.social/@stablehorde/109398715339480426

SD 2.0

[x] Initial implementation ready for testing

[ ] img2img

[ ] inpainting

[ ] k_diffusers support

Originally posted by @AlRlC in https://github.com/Sygil-Dev/nataili/issues/67#issuecomment-1326385645

Nov 24 '22 12:11 0xdevalias

https://github.com/TheLastBen/fast-stable-diffusion/commit/11fd38bfbd2f1ed42449b37ba88ba324ff42ba43

Create pathsV2.py

https://github.com/TheLastBen/fast-stable-diffusion/commit/fe445d986f08a1134f26f5efcd1c0829f34bc481

Support for SD V.2

https://github.com/TheLastBen/fast-stable-diffusion/commit/da9b38010c2edc8fcccf2b0b70f321af30c0ecb8

fix

https://github.com/TheLastBen/fast-stable-diffusion/commit/6c84728c72bd9735b0a5be4c62a292554c3b41d1

fix

https://github.com/TheLastBen/fast-stable-diffusion/commit/04ba92b1931ab6aa0269a0516640f8874b004885

fix

https://github.com/TheLastBen/fast-stable-diffusion/commit/ebea13401da873b3420fdf6f0fa02df567534a55

Create sd_hijackV2.py

https://github.com/TheLastBen/fast-stable-diffusion/commit/88496f5199c82e9c5ee2ae40bc980140d8cd4ce5

Create sd_samplersV2.py

https://github.com/TheLastBen/fast-stable-diffusion/commit/f324b3d85473d308ebeefb03de58ae6eb9070f42

fix V2

Originally posted by @0xdevalias in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326446674

Nov 24 '22 13:11 0xdevalias

Should work now, make sure you check the box "redownload original model" when choosing V2

https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb

Requires more than 12GB of RAM for now, so free colab probably won't suffice.

Originally posted by @TheLastBen in https://github.com/TheLastBen/fast-stable-diffusion/issues/599#issuecomment-1326461962

Nov 24 '22 13:11 0xdevalias

diffusers diffusers copied to clipboard

Come on, come on, let's adapt the conversion script to SD 2.0

🤗 Diffusers with Stable Diffusion 2 is live!

📰 News

✏️ Notes & Information

💭 User Story (Prior to Huggingface Diffusers 0.9.0 Release)

SD 2.0

diffusers
diffusers copied to clipboard