diffusers
diffusers copied to clipboard
Why does the performance get worse after I converted stable diffusion checkpoint to diffuses?
I fine tuned a stable diffusion model and saved the check point which is ~14G. And then I used the script in this repo convert_original_stable_diffusion_to_diffusers.py to convert it to diffusers, which is great since it's much more convenient for usage.
However, when I test txt2img using the converted diffuser pipeline, the performance get worse in term of quality. On diffuser side, I didn't set the half precision, and try the same seed / steps as stable diffusion.
Could you pls provide some suggestion on how to maintain the same quality as the original stable diffusion model?
Thanks a lot.
I also tried to convert diffuser back to the original stable diffusion, and resulting a 4GB cpkt, which is much more smaller than my original checkpoint. Seems something is lost during the conversion.
I converted stable diffusion to diffusers by
python convert_original_sd_to_diffusers.py --checkpoint_path '$ckptpath' --scheduler_type 'ddim' --dump_path '$outputpath' --scheduler_type 'ddim'
Hey @rorepmezzz,
I'm guessing you are not extracting the EMA weights but the final fine-tuned weights and you're using the EMA weights with CompVis.
Could you try running the following conversion command instead:
python convert_original_sd_to_diffusers.py --checkpoint_path '$ckptpath' --scheduler_type 'ddim' --dump_path '$outputpath' --scheduler_type 'ddim' --extract_ema
And check if the results are better this way in diffusers? Would be great if you could give me feedback, really curious to find out what's going on there
Hi @patrickvonplaten Thanks a lot for the suggestions. I tried adding "--extract_ema", but the output images doesn't have too much difference from the pervious way. I also tried to convert diffusers back to sd, and then I could only generate random nosies with 200 steps in SD==
Is there any other possible reasons?
Hmmm, interesting usually converting CompVis checkpoints works pretty well.
Could you maybe upload your CompVis checkpoint to a repo on the Hub and I'll try to convert it?
@patrickvonplaten That's so great! Please take a look when you have time. And sorry for replying late, didn't get a chance to look git before...
My cpkt is in this repo: ringhyacinth/nail-diffusion, checkpoint: epoch=000479.ckpt yaml config: /cofigs/2022-11-11T06-43-01-project.yaml
I'm struggling on this conversion issue for several days, since I really want to use the nice API in the hub. Please let me know if you can convert it with original quality and let me know if anything do wrong on my side.
Thank you so much!
I'm facing the same issue. My weights are 'weight type (torch.cuda.HalfTensor)'
Looking now!
Hey @rorepmezzz,
I ran the following command:
python diffusers/scripts/convert_original_stable_diffusion_to_diffusers.py --extract_ema --checkpoint_path nail-diffusion/epoch=000479.ckpt --dump_path ./nail-diffusion
to convert your checkpoint. Note you the parameter --extract_ema is added to extract the EMA checkpoints (this is not done by default I think).
Your diffusers checkpoint is uploaded here: https://huggingface.co/ringhyacinth/nail-diffusion/commit/f3809cce4ef2787435613dc07160e071d45f7e3f
Can you test it and report back if performance matches?
Thanks @patrickvonplaten for help!
I test the same prompts for diffusers and original sd: ran 200 steps for inference for both models. The performance for diffuser is still poor in terms of content and details. For diffusers, I used original weights precision instead of float16 to try to recover the original precision.
Also I notice that the diffusers with "--extract_ema" yield similar performance as the diffuser without "--extract_ema".
See prompt "butterfly" outputs in below folder for example. https://huggingface.co/ringhyacinth/nail-diffusion/tree/main/images
Could you pls advise if there are any other potential issues? Thanks.
Thanks @patrickvonplaten for help!
I test the same prompts for diffusers and original sd: ran 200 steps for inference for both models. The performance for diffuser is still poor in terms of content and details. For diffusers, I used original weights precision instead of float16 to try to recover the original precision.
Also I notice that the diffusers with "--extract_ema" yield similar performance as the diffuser without "--extract_ema".
See prompt "butterfly" outputs in below folder for example. https://huggingface.co/ringhyacinth/nail-diffusion/tree/main/images
Could you pls advise if there are any other potential issues? Thanks.
Hmmm, what scheduler are you using? PLMS?
Thanks @patrickvonplaten for help! I test the same prompts for diffusers and original sd: ran 200 steps for inference for both models. The performance for diffuser is still poor in terms of content and details. For diffusers, I used original weights precision instead of float16 to try to recover the original precision. Also I notice that the diffusers with "--extract_ema" yield similar performance as the diffuser without "--extract_ema". See prompt "butterfly" outputs in below folder for example. https://huggingface.co/ringhyacinth/nail-diffusion/tree/main/images Could you pls advise if there are any other potential issues? Thanks.
Hmmm, what scheduler are you using? PLMS?
Do you mean for interference? For stable diffusion, I used DDIM. For diffusers, I just use default setting.
Below is what I do for diffusers:
from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained(repo_path) pipe = pipe.to(device) images = pipe(prompts, guidance_scale=7.5, num_inference_steps=50, seed=283).images
I see, ok in diffusers you currently use PLMS/PNDM so the difference might come from this.
Could you try the following:
-
- Merge this PR: https://huggingface.co/ringhyacinth/nail-diffusion/discussions/1
-
- Run:
from diffusers import StableDiffusionPipeline, DDIMScheduler
pipe = StableDiffusionPipeline.from_pretrained(repo_path)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to(device)
images = pipe(prompts, guidance_scale=7.5, num_inference_steps=50, seed=283).images
(Note you should be on "main" branch for this to work)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hey @rorepmezzz,
I'm guessing you are not extracting the EMA weights but the final fine-tuned weights and you're using the EMA weights with CompVis.
Could you try running the following conversion command instead:
python convert_original_sd_to_diffusers.py --checkpoint_path '$ckptpath' --scheduler_type 'ddim' --dump_path '$outputpath' --scheduler_type 'ddim' --extract_emaAnd check if the results are better this way in
diffusers? Would be great if you could give me feedback, really curious to find out what's going on there
Greeting, I've got a similar problem: I lose quality when do !python convert_diffusers_to_original_stable_diffusion.py --model_path #mdl_path --checkpoint_path $ckpt_path $half_arg Is there a similar solution?