diffusers
diffusers copied to clipboard
How can I transfer the self-trained model folder to a single ckpt model file?
I trained a model in local PC following this instruction: diffusers\examples\textual_inversion\README.md
The output of the self-trained model is a folder:

I can use StableDiffusionPipeline.from_pretrained(“THE OUTPUT OF MODEL FOLDER PATH”) to inference.
BUT, the origin stable diffusion’s model is just a one single ckpt file “sd-v1-4.ckpt” . And most webui tool only can use ckpt file;
How can I transfer the self-trained output model folder to a single ckpt model file?
For now, this cannot be done. The only way is to write your own diffusers to comp-vis format converter.
Interesting!
cc @apolinario @patil-suraj
Which webui tool are you using @boyjunqiang ? Maybe we could see if we could integrate diffusers into the codebase or alternatively we could add some conversion scripts here.
Interesting!
cc @apolinario @patil-suraj
Which webui tool are you using @boyjunqiang ? Maybe we could see if we could integrate
diffusersinto the codebase or alternatively we could add some conversion scripts here.
It's very exciting news! We use this webui tool (https://github.com/AUTOMATIC1111/stable-diffusion-webui) for our workflow.
@patil-suraj @anton-l let's try to make the library easier to use with stable-diffusion-webui
Duplicate of this issue that's gaining some momentum: https://github.com/huggingface/diffusers/issues/672 (should eventually at least solve the initial problem this ticket was about)
I have a fix in PR #701
Looking!
Merged! Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Could someone please explain why diffusers chose to use a complex file structure to hold their models? I'm confused by this design choice...wouldn't using pytorch's built-in save function to make .ckpt files from a nn.Module be way simpler?
Good question @RyannDaGreat,
I'm sorry that the file structure appears to be complex. I've tried to make as as intuitive as possible but seemed to have failed here a bit :sweat_smile:
In short the logic is the following:
-
- Diffusion Pipelines consists of multiple independently trained
nn.Modules, e.g. VAE , the Unet, the Text-encoder are often/usually all trained independently. This has two implications:- VAE weighs are not dependendent on the UNet -> therefore we rarely need to pass the gradient from the UNet to the VAE during training -> therefore we don't need to wrap the UNet and VAE under the same
nn.Module - We want to be able to switch out individually components. E.g. it's trivial to switch out the VAE with an improved version: https://huggingface.co/stabilityai/sd-vae-ft-mse#how-to-use-with-%F0%9F%A7%A8-diffusers -> this would not be as simple if there would only be one ckpt file.
- Pipelines are highly configurable and will probably continue to be so. To be able to handle more complex future architectures, we deemed it to be a more stable API to have a folder system instead of a single file.
- VAE weighs are not dependendent on the UNet -> therefore we rarely need to pass the gradient from the UNet to the VAE during training -> therefore we don't need to wrap the UNet and VAE under the same
- Diffusion Pipelines consists of multiple independently trained
-
- The folder structure is designed to be intuitively understandable. It's based on the following thinking:
- every folder is called exactly like the attribute of the pipeline. E.g. if
pipe.unetexists and points to thenn.ModuleUNet then the folder has to be called"unet"(see here). So there shouldn't be any ambiguity regarding naming - every folder is independently callable. E.g. everything related to the
"unet"can be found in the"unet"folder. You can load it individually with:
this is quite important when training only parts of the model like is done in textual inversion. Note if everything would be in a single file then for training textual inversion we would have to load/save many more parameters than required.model = UNetConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")- Finally, things start to become quite complicated when models become large (like GPT-3-like large), files as large as 10GB should be sharded into mulitple files anyways and loaded from a folder structure - otherwise many internet bandwith won't be able to efficiently download such large files
- every folder is called exactly like the attribute of the pipeline. E.g. if
- The folder structure is designed to be intuitively understandable. It's based on the following thinking:
Hope this clarifies things a bit :-)
Thank you @patrickvonplaten, your reasoning makes sense! Being able to break it down into modular chunks is appealing. A single .ckpt is bulky, I agree. Being able to swap out just the u-net sounds like it could save a lot of harddrive space when fine-tuning the models multiple times.
I just wish Huggingface would stick to a single format though - many other codebases just use a single .ckpt file (like huggingface has available for download); and it seems incompatiable with this format. This means people have to choose to develop for one format or the other, splintering all projects into two possible formats. And that has conseque nces when people create dreambooth models (someone gave me a .ckpt file they made using dreambooth, and it's not compatiable with my code since my code uses the diffusers library).
For example, let's say some new research paper comes out that fine tunes SD; such a second dreambooth (we'll call it hallucibooth lol). The hallucibooth will fine-tune SD's weights, but has to choose the format to be used...why not unify it?
After looking at the parameter names contained inside the full .ckpt file and the UNetConditionModel.parameters(), it seems non-obvious how to convert them as they have different names for everything. Since both formats are released by huggingface...why not choose to stick with just one format or the other?
UPDATE: I've had success after learning how to use https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py ! Still confused how we got to the point where we need such a script tho lol - but it does work well :)
Note that the single .ckpt file doesn't come from HF that's the original format stable diffusion used
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Note that the single
.ckptfile doesn't come from HF that's the original format stable diffusion used
Hi Patrick,
just saw how knowledgable you are. I was wondering if you have any converter in mind that will convert embeddings in .PT format to safetensors? I have tried to convert a few with the tools on HF, but they all run into an error with "State_Dict" expected.