diffusers Why does the performance get worse after I converted stable diffusion checkpoint to diffuses?

I fine tuned a stable diffusion model and saved the check point which is ~14G. And then I used the script in this repo convert_original_stable_diffusion_to_diffusers.py to convert it to diffusers, which is great since it's much more convenient for usage.

However, when I test txt2img using the converted diffuser pipeline, the performance get worse in term of quality. On diffuser side, I didn't set the half precision, and try the same seed / steps as stable diffusion.

Could you pls provide some suggestion on how to maintain the same quality as the original stable diffusion model?

Thanks a lot.

Nov 06 '22 07:11 rorepmezzz

I also tried to convert diffuser back to the original stable diffusion, and resulting a 4GB cpkt, which is much more smaller than my original checkpoint. Seems something is lost during the conversion.

I converted stable diffusion to diffusers by

python convert_original_sd_to_diffusers.py --checkpoint_path '$ckptpath' --scheduler_type 'ddim' --dump_path '$outputpath' --scheduler_type 'ddim'

Nov 06 '22 07:11 rorepmezzz

Hey @rorepmezzz,

I'm guessing you are not extracting the EMA weights but the final fine-tuned weights and you're using the EMA weights with CompVis.

Could you try running the following conversion command instead:

python convert_original_sd_to_diffusers.py --checkpoint_path '$ckptpath' --scheduler_type 'ddim' --dump_path '$outputpath' --scheduler_type 'ddim' --extract_ema

And check if the results are better this way in diffusers? Would be great if you could give me feedback, really curious to find out what's going on there

Nov 07 '22 20:11 patrickvonplaten

Hi @patrickvonplaten Thanks a lot for the suggestions. I tried adding "--extract_ema", but the output images doesn't have too much difference from the pervious way. I also tried to convert diffusers back to sd, and then I could only generate random nosies with 200 steps in SD==

Is there any other possible reasons?

Nov 08 '22 16:11 rorepmezzz

Hmmm, interesting usually converting CompVis checkpoints works pretty well.

Could you maybe upload your CompVis checkpoint to a repo on the Hub and I'll try to convert it?

Nov 09 '22 08:11 patrickvonplaten

@patrickvonplaten That's so great! Please take a look when you have time. And sorry for replying late, didn't get a chance to look git before...

My cpkt is in this repo: ringhyacinth/nail-diffusion, checkpoint: epoch=000479.ckpt yaml config: /cofigs/2022-11-11T06-43-01-project.yaml

I'm struggling on this conversion issue for several days, since I really want to use the nice API in the hub. Please let me know if you can convert it with original quality and let me know if anything do wrong on my side.

Thank you so much!

Nov 12 '22 16:11 rorepmezzz

I'm facing the same issue. My weights are 'weight type (torch.cuda.HalfTensor)'

Nov 13 '22 19:11 adamfils2

Looking now!

Nov 17 '22 14:11 patrickvonplaten

Hey @rorepmezzz,

I ran the following command:

python diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --extract_ema --checkpoint_path nail-diffusion/epoch=000479.ckpt --dump_path ./nail-diffusion

to convert your checkpoint. Note you the parameter --extract_ema is added to extract the EMA checkpoints (this is not done by default I think). Your diffusers checkpoint is uploaded here: https://huggingface.co/ringhyacinth/nail-diffusion/commit/f3809cce4ef2787435613dc07160e071d45f7e3f

Can you test it and report back if performance matches?

Nov 17 '22 15:11 patrickvonplaten

Thanks @patrickvonplaten for help!

I test the same prompts for diffusers and original sd: ran 200 steps for inference for both models. The performance for diffuser is still poor in terms of content and details. For diffusers, I used original weights precision instead of float16 to try to recover the original precision.

Also I notice that the diffusers with "--extract_ema" yield similar performance as the diffuser without "--extract_ema".

See prompt "butterfly" outputs in below folder for example. https://huggingface.co/ringhyacinth/nail-diffusion/tree/main/images

Could you pls advise if there are any other potential issues? Thanks.

Nov 18 '22 16:11 rorepmezzz

Thanks @patrickvonplaten for help!

I test the same prompts for diffusers and original sd: ran 200 steps for inference for both models. The performance for diffuser is still poor in terms of content and details. For diffusers, I used original weights precision instead of float16 to try to recover the original precision.

Also I notice that the diffusers with "--extract_ema" yield similar performance as the diffuser without "--extract_ema".

See prompt "butterfly" outputs in below folder for example. https://huggingface.co/ringhyacinth/nail-diffusion/tree/main/images

Could you pls advise if there are any other potential issues? Thanks.

Hmmm, what scheduler are you using? PLMS?

Nov 19 '22 19:11 patrickvonplaten

Thanks @patrickvonplaten for help! I test the same prompts for diffusers and original sd: ran 200 steps for inference for both models. The performance for diffuser is still poor in terms of content and details. For diffusers, I used original weights precision instead of float16 to try to recover the original precision. Also I notice that the diffusers with "--extract_ema" yield similar performance as the diffuser without "--extract_ema". See prompt "butterfly" outputs in below folder for example. https://huggingface.co/ringhyacinth/nail-diffusion/tree/main/images Could you pls advise if there are any other potential issues? Thanks.

Hmmm, what scheduler are you using? PLMS?

Do you mean for interference? For stable diffusion, I used DDIM. For diffusers, I just use default setting.

Below is what I do for diffusers:

from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained(repo_path) pipe = pipe.to(device) images = pipe(prompts, guidance_scale=7.5, num_inference_steps=50, seed=283).images

Nov 20 '22 03:11 rorepmezzz

I see, ok in diffusers you currently use PLMS/PNDM so the difference might come from this.

Could you try the following:

1. Merge this PR: https://huggingface.co/ringhyacinth/nail-diffusion/discussions/1
1. Run:

from diffusers import StableDiffusionPipeline, DDIMScheduler
pipe = StableDiffusionPipeline.from_pretrained(repo_path)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to(device)
images = pipe(prompts, guidance_scale=7.5, num_inference_steps=50, seed=283).images

(Note you should be on "main" branch for this to work)

Nov 21 '22 10:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 15 '22 15:12 github-actions[bot]

Hey @rorepmezzz,

I'm guessing you are not extracting the EMA weights but the final fine-tuned weights and you're using the EMA weights with CompVis.

Could you try running the following conversion command instead:
python convert_original_sd_to_diffusers.py --checkpoint_path '$ckptpath' --scheduler_type 'ddim' --dump_path '$outputpath' --scheduler_type 'ddim' --extract_ema
And check if the results are better this way in diffusers? Would be great if you could give me feedback, really curious to find out what's going on there

Greeting, I've got a similar problem: I lose quality when do !python convert_diffusers_to_original_stable_diffusion.py --model_path #mdl_path --checkpoint_path $ckpt_path $half_arg Is there a similar solution?

Jul 08 '23 01:07 maksym-petrenko

diffusers diffusers copied to clipboard

Why does the performance get worse after I converted stable diffusion checkpoint to diffuses?

diffusers
diffusers copied to clipboard