diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Natively support loading CKPT files

Open damian0815 opened this issue 2 years ago • 24 comments

Hi, I'm one of the developers of InvokeAI, a Stable Diffusion web UI targetting professionals and content creation industry use cases.

We are migrating from a CompVis to a Diffusers backend, and have recently determined that Diffusers cannot actually load ckpt models natively. We're aware of the conversion script, but this doubles disk usage and is awkward. This is going to be a pain point for some of our users, who are going to expect to be able to flexibly used .ckpt models trained with various different versions of Dreambooth with our UI, and potentially have two webUI projects sharing a single data folder.

Are there plans to implement native ckpt loading in Diffusers at some point?

damian0815 avatar Dec 07 '22 16:12 damian0815

I would like to second this - Model standardization is going to be a key element of the ecosystem's success. Open to discuss all potential solutions that would lead us to this end-goal.

hipsterusername avatar Dec 07 '22 16:12 hipsterusername

Good idea. We have so many models in .ckpt and users don’t want to convert what they already have.

netsvetaev avatar Dec 07 '22 17:12 netsvetaev

Q: Why require a conversion to Diffusers Model and not just use ckpt files? A: https://github.com/huggingface/diffusers/issues/468#issuecomment-1300296884

We want to be able to switch out individually components. E.g. it's trivial to switch out the VAE with an improved version: https://huggingface.co/stabilityai/sd-vae-ft-mse#how-to-use-with-%F0%9F%A7%A8-diffusers -> this would not be as simple if there would only be one ckpt file.

averad avatar Dec 07 '22 17:12 averad

Q: Why require a conversion to Diffusers Model and not just use ckpt files? A: #468 (comment)

We want to be able to switch out individually components. E.g. it's trivial to switch out the VAE with an improved version: https://huggingface.co/stabilityai/sd-vae-ft-mse#how-to-use-with-%F0%9F%A7%A8-diffusers -> this would not be as simple if there would only be one ckpt file.

We understand the reasoning behind it. But the major issue here is that the majority of the Stable Diffusion models in the community are shared with each other in the ckpt format as a single file. As a result, we'd have to force every user to make this conversion. In and itself, it would not have been an issue because the conversion is a one time process that takes around 40-50 seconds but the major issue here is the disk space. Effectively you are doubling the amount of disk space a user needs to allocate in order to have a diffusers version and the regular checkpoint.

For most users this is a deal breaker. And considering the majority of the SD implementations out there use the ckpt format including ours, pushing users to maintain both on their drives feels like a very great ask which would be frowned upon.

blessedcoolant avatar Dec 07 '22 18:12 blessedcoolant

@blessedcoolant discussion about the file size and potential updates: https://github.com/huggingface/diffusers/issues/1501

averad avatar Dec 07 '22 19:12 averad

This conversation is currently about ckpt files, but as we're already seeing safetensors gain some traction outside the non-🧨diffusers applications, I expect the same issues will pop up around safetensors too.

Does that sound right? Because safetensors is a new serialization format, but it's not an exchange format specifically for diffusers models, so the same renamings that happen in the conversion scripts will still be required.

The one possibility that I hold out maybe a little bit of hope for is that since it's a simpler format that doesn't involve pickle, maybe it'll be more feasible to do those conversions on the fly?

keturn avatar Dec 07 '22 19:12 keturn

@averad - Appreciate that this has been considered; I think the proposal we're making here is that the conclusion reached (just run a conversion script) is not practical or realistic.

Model proliferation is an emerging phenomenon - I've counted no less than 3 new model-oriented community sites shared by users in the last week. Users are generating new (non-converted) .ckpt files daily, in the various open-source training tools available.

Having users convert these is an uphill battle, and there isn't a strategic position to enforce or incentivize that approach. Professionals and enthusiasts (our core userbase) have 10s if not 100s of GB of models already. Doubling the amount of disk space required to keep models in compatible states with multiple tools is effectively a non-starter for most - Not to mention that the non-converted .ckpt filetype is the de facto standard.

Is there any evidence that there is meaningful buy-in to move to the diffusers model as a standard across all of the training repos, UIs, and model marketplaces out there so that conversion won't be needed?

hipsterusername avatar Dec 07 '22 20:12 hipsterusername

My understanding is Huggingface's position has been they would like Automatic1111 to convert over to Diffuser Models 🤷. Users gain more functionality using 🤗 Diffuser Models than with ckpt files alone, at the cost of some disk space.

A Huggingface rep would need to answer your question directly, I am just providing information here to make that conversation smoother.

If your not already in the 🤗 Discord you should join. https://discuss.huggingface.co/t/join-the-hugging-face-discord/11263

averad avatar Dec 07 '22 20:12 averad

Will add to that a lot of [online] storage methods are having issues with Diffusers weights. Breaking up downloads, renaming bin files and downloading them separate of the ZIP archive.

Dealing with easy delivery of a ZIP from a storage solution, like gdrive, is a more work then just downloading a file and unzipping. You're unzipping, renaming a bin file, and moving it where it should be.

WASasquatch avatar Dec 08 '22 01:12 WASasquatch

Hey @damian0815,

Thanks for bringing this up - that's an important topic!

In short, I completely agree that we should try to unify the saving format - it's very annoying to have two formats.

As mentioned before, the reason why we have a more general (folder format) saving format in diffusers is:

a) so that we stay more flexible for future models / pipelines. E.g. models like Dalle-2 and Imagen have multiple schedulers, unets, etc... which are usually trained independently and can be swapped in & out. I do see though that it's simply more portable to just have a single file. b) We absolutely want the saved pipeline to be self-contained meaning all files / configs needed to run the pipeline are present (which e.g. for stable diffusion includes tokenizers, models, configs)

That being said, I do understand that a single file is easier to share and I also agree with the statement:

But the major issue here is that the majority of the Stable Diffusion models in the community are shared with each other in the ckpt format as a single file.

This however is a bit unfortunate because it quickly becomes impossible to "guess" the correct architecture just from a PyTorch checkpoint file. The more different architectures are supported the more we need configuaration files -> we simply cannot infer model types from tensor shapes.

So to me, I really don't see how we can have a general, maintainable "single-file" saving format.

We can definitely add a from_pretrained_ckpt(...) function to StableDiffusionPipeline that tries to guess the correct model type and then converts the checkpoint on the fly into the diffusers format, but given that we already have different model types that have exactly the same weights layout (SD v2-base and SD v2-768), we cannot guarantee to make this loading function to always work correctly. To remedy this we could let people add another configuration argument?

@classmethod
from_pretained_ckpt(cls, name_or_path, yaml_config):
    ...

This could /should work. But I think sadly many people don't share the yaml_config of their ckpt model no?

Thoughts on this?

patrickvonplaten avatar Dec 08 '22 19:12 patrickvonplaten

We can definitely add a from_pretrained_ckpt(...) function to StableDiffusionPipeline that tries to guess the correct model type and then converts the checkpoint on the fly into the diffusers format

@patrickvonplaten - When you say 'converts the checkpoint on the fly', are you suggesting that it runs the 40-50s conversion process on load, or just that it takes the .ckpt file and yaml_config to offer support to standard ckpt files? The latter would be the desired state, since that would allow for users to maintain a single .ckpt without duplication, and for interfaces (like Invoke) to manage the config arguments for users. If that's the case, I think that's effectively what we'd be looking for.

Invoke manages the following for each Checkpoint file users load into the system:

  • config yaml (e.g., v2-inference)
  • related vae to load (if applicable)
  • other internal flags like model name, description, model width & height, model settings, etc.

hipsterusername avatar Dec 08 '22 20:12 hipsterusername

a) so that we stay more flexible for future models / pipelines. E.g. models like Dalle-2 and Imagen have multiple schedulers, unets, etc... which are usually trained independently and can be swapped in & out.

This seems like advanced, manual usage, than general usage that most people are looking to use out of diffusers, and imo, isn't a good argument for this whole case point. Who is actually doing these things over the thousands and thousands not??? Dall-e? Haven't even heard of anyone making any good use out of Diffusers for Dalle, or really anything other then Stable Diffusion when it comes to this regard. Having this support is great, like you said, more flexible, but it is not standard, it's the opposite. And from the efforts of standardization elsewhere, not sure why we couldn't have it here for 95% of the rest of the synthesis world.

This sort of functionality falls back on ideas like adding Guided Diffusion, which is also a pt/ckpt format, and it wouldn't be intuitive to just ask people to convert to another format.

Edit: One could easily say there are millions of users using diffusers via end-point services, exclusively for Stable Diffusion, and I feel I can assume this absolutely dwarfs any minority like Dalle usage, etc.

WASasquatch avatar Dec 08 '22 20:12 WASasquatch

What do you think about this? https://github.com/unifyai/ivy

camenduru avatar Dec 08 '22 20:12 camenduru

Just want to second @WASasquatch 's comments here. Not supporting ckpt after it has become such a standard in the community is somewhat similar to if Adobe decided to drop support for JPEG from Photoshop because it's compression is not as good as newer formats. The point isn't whether it's the best format for the job, the point is that almost literally everyone in the broader community is using this format. Adding support for a more nuanced format is one thing, making users jump through a bunch of extra hoops and waste a bunch of disk space in order to import the current standard format is entirely another.

Standards aren't determined by fiat, they're like word usages. You can't go to Miriam Websters and tell them your definition for the word "art" is more utilitarian and therefore they should throw out the other definitions. The standard is what everyone is using.

Standards organizations don't work that way either, the "standards" they create become standards once everyone starts using them, not when they declare them a standard.

Until the community switches over to your new format (which seems unlikely at least with current trends) you really should support the standard out of the box, which today is ckpt.

I feel like this isn't even something that should be controversial

nina-tehkar avatar Dec 08 '22 21:12 nina-tehkar

@patrickvonplaten In our experience the vast majority of .ckpt files floating around in the wild are Dreambooth-like finetunings based off of the base v1.5 model, which are 100% compatible with the v1-inference.yaml that shipped with compvis-flavoured SD. It is enough, for example, to simply tell users to use v1-inference if they don't know what yaml is appropriate with their model.

As SDv2 spreads and usage picks up, i would expect that, similarly, the vast majority of .ckpt or hopefully from now on .safetensors files will be based off of v2-inference.yaml. The changes between v1-inference and v2-inference are small enough that it's even conceivable that, if the user does not know what they need, our code could do a tentative load of a checkpoint to programmatically determine which config file it likely needs. These files are small and the breaking changes are the sorts of things that we can just track in code.

In short - model standard proliferation is not a problem we are experiencing support queries for. However, if/when we switch to Diffusers, I can guarantee that without native ckpt loading we will have to field support queries and outright complaints that we can't just load users' .ckpt files without a disk-hungry conversion process.

damian0815 avatar Dec 08 '22 22:12 damian0815

Please fork diffusers and add .ckpt functionality.

camenduru avatar Dec 08 '22 22:12 camenduru

One could easily say there are millions of users using diffusers via end-point services, exclusively for Stable Diffusion, and I feel I can assume this absolutely dwarfs any minority like Dalle usage, etc.

Friend Sasquatch, if anyone in this thread is trying to throw their weight around by claiming they represent "millions" of users, they can gosh darn well afford to hire someone to implement this on-the-fly conversion and submit the PR.

Software-as-a-service hosts won't, though, because it's far easier for them to pay the one-time conversion cost and discard the old version.

keturn avatar Dec 08 '22 23:12 keturn

One could easily say there are millions of users using diffusers via end-point services, exclusively for Stable Diffusion, and I feel I can assume this absolutely dwarfs any minority like Dalle usage, etc.

Friend Sasquatch, if anyone in this thread is trying to throw their weight around by claiming they represent "millions" of users, they can gosh darn well afford to hire someone to implement this on-the-fly conversion and submit the PR.

Software-as-a-service hosts won't, though, because it's far easier for them to pay the one-time conversion cost and discard the old version.

You misunderstand. This is talking about all the services that are deployed, by separate people, and all the users using these services. I know one service alone that broke 1 million registered users. And it is certainly not users jobs to maintain diffusers. It shouldn't even be so controversial with requests to be more compatible with their user base. That's really rather ridiculous to assert imo. I'll add that this doesn't even seem to be an issue of PRs but the almost politics of diffusers/HF that turn things like wanting standardization into controversy and debate. Yet diffusers talks about standardization, just in the wrong light.

And it's bad enough that diffusers development goes and breaks all these services and users personal implementations at a whim with their development without future warning or proper documentation regarding planned actions. A lot of it pretty arbitrary changes, like changing all pipes to use image instead of one pipe to use init_image which is by far more explanative of what this var is carrying and doing then generically "image", and already standard across ML image synthesis.

There really needs to be tighter QA with HF projects, imo, representative of the actual people using diffusers as a open-source API.

WASasquatch avatar Dec 09 '22 05:12 WASasquatch

I just found this tool to Convert a Stable Diffusion Checkpoint to Diffusers on HuggingFace Spaces: https://huggingface.co/spaces/anzorq/sd-to-diffusers https://colab.research.google.com/gist/qunash/f0f3152c5851c0c477b68b7b98d547fe/convert-sd-to-diffusers.ipynb Haven't tried it yet, but looks like what we're looking for. I've been finding a bunch of nice finetuned models that I want to use, then discover they're .ckpt only and can't be loaded. It'd be nice if a version of this util is built in somehow.. or even better to load a .ckpt file directly even though it's not as complete.

Skquark avatar Dec 11 '22 20:12 Skquark

I just found this tool to Convert a Stable Diffusion Checkpoint to Diffusers on HuggingFace Spaces: https://huggingface.co/spaces/anzorq/sd-to-diffusers https://colab.research.google.com/gist/qunash/f0f3152c5851c0c477b68b7b98d547fe/convert-sd-to-diffusers.ipynb Haven't tried it yet, but looks like what we're looking for. I've been finding a bunch of nice finetuned models that I want to use, then discover they're .ckpt only and can't be loaded. It'd be nice if a version of this util is built in somehow.. or even better to load a .ckpt file directly even though it's not as complete.

These conversion tools have been around (though regularly suffer problems), and are no substitute, or alternative. Diffusers Weights are obtrusive, hard to deal with programmatically for services, and just a bad idea for general use, despite their intentions with arbitrarily swapping stuff out as manual advanced usage that could be kept as such, for such people. But standardization needs to happen, and it ain't Diffusers/Transformers Weights.

I can admit that's cool... for R&D, to be able to swap vaes, etc, but for utilizing community models? Entirely irrelevant.

WASasquatch avatar Dec 11 '22 20:12 WASasquatch

s we're already seeing safetensors gain some traction outside the non-diffusers applications, I expect the same issues will pop up around safetensors too.

safetensors is now natively supported by diffusers :-) We've also opened a PR here: https://huggingface.co/runwayml/stable-diffusion-v1-5/discussions/46 and want to open PRs to all the other highly used models as well.

patrickvonplaten avatar Dec 13 '22 19:12 patrickvonplaten

I think we can build a from_ckpt function that converts weights on the fly where we automatically infer the correct architecture. This is a very brittle functioning though since checkpoints should be shared with a configuration file and not just as weights.

We cannot naively support the model architecture of .ckpt because: - .ckpt has all components "vae", "unet", "clip" entangled. We don't want this because each of those can be changed out already (note SD 1 uses openai clip where as SD2 uses open_clip) and will certainly be done more in future models (e.g. Imagen uses T5 instead of "clip", eDiffi uses a mixture of T5 and CLIP for the text encoder). Note this is also the main reason most dreambooth & textual inversion scripts are based on diffusers. - With ckpt we cannot leverage transformers from_pretrained(...) method because the text encoder is entangled in the checkpoint format -> we want to leverage transformers though to not reinvent the wheel - it's too backwards breaking as all of our existing checkpoints (6000+ on the Hub) wouldn't be compatible anymore.

=> So the best we can do here is to create a "on-the-fly" conversion that is basically the same as: https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

Does this make sense?

patrickvonplaten avatar Dec 13 '22 19:12 patrickvonplaten

Would anybody be interested in opening a PR here? :-)

patrickvonplaten avatar Dec 13 '22 19:12 patrickvonplaten

we want to leverage transformers though to not reinvent the wheel

This is a bit ironic considering diffusers is known around town, as, reinventing the wheel and "use our format" or "you should adopt our format" instead of a overwhelmingly more popular, and standard format, despite the arguments used here, which related back to reinventing the wheel and standards already in place.

I feel diffusers can keep it's integrity with it's own format, and provide native CKPT separately with the systems it can be used for -- naturally limited to the formats limits.

WASasquatch avatar Dec 15 '22 20:12 WASasquatch

Is this https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py now available as a module that has a single function-call entry point? The current script still has a lot of setup being done in the main section.

lstein avatar Dec 20 '22 19:12 lstein

Q: Why require a conversion to Diffusers Model and not just use ckpt files? A: #468 (comment) We want to be able to switch out individually components. E.g. it's trivial to switch out the VAE with an improved version: https://huggingface.co/stabilityai/sd-vae-ft-mse#how-to-use-with-%F0%9F%A7%A8-diffusers -> this would not be as simple if there would only be one ckpt file.

We understand the reasoning behind it. But the major issue here is that the majority of the Stable Diffusion models in the community are shared with each other in the ckpt format as a single file. As a result, we'd have to force every user to make this conversion. In and itself, it would not have been an issue because the conversion is a one time process that takes around 40-50 seconds but the major issue here is the disk space. Effectively you are doubling the amount of disk space a user needs to allocate in order to have a diffusers version and the regular checkpoint.

For most users this is a deal breaker. And considering the majority of the SD implementations out there use the ckpt format including ours, pushing users to maintain both on their drives feels like a very great ask which would be frowned upon.

I didnt conversation and now getting error. it didnt produce yaml file

FurkanGozukara avatar Jan 01 '23 13:01 FurkanGozukara

Is this https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py now available as a module that has a single function-call entry point? The current script still has a lot of setup being done in the main section.

i used this and i got error : https://github.com/huggingface/diffusers/issues/1877

edit :

since related to the topic i am adding more info that i have found

For those who wonders I have easier ways to

1: convert from diffusers to ckpt 2: convert from ckpt to diffusers 3: convert from safetensors to ckpt or diffusers

1: I have a video that explains how to do : How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File

1: alternative to above link, you can upload diffusers to google colab and use generate ckpt option there : Stable Diffusion Google Colab, Continue, Directory, Transfer, Clone, Custom Models, CKPT SafeTensors

2: we are using automatic1111, again in this video i explain : How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File

3: we are using again automatic1111 web ui : How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File

I can't say these are optimal way to do however they work

FurkanGozukara avatar Jan 01 '23 13:01 FurkanGozukara

Yes it is. See ldm/invoke/ckpt_to_diffuser.pyfor the function call, and ldm/invoke/model_cache.py for a method that converts and imports into models.yaml.

On Sun, Jan 1, 2023 at 8:54 AM Furkan Gözükara @.***> wrote:

Is this https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py now available as a module that has a single function-call entry point? The current script still has a lot of setup being done in the main section.

i used this and i got error : #1877 https://github.com/huggingface/diffusers/issues/1877

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1595#issuecomment-1368448336, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVMBWFHQSJIQEUNGZVDWQGECVANCNFSM6AAAAAASXAIN2U . You are receiving this because you commented.Message ID: @.***>

--

Lincoln Stein

Head, Adaptive Oncology, OICR

Senior Principal Investigator, OICR

Professor, Department of Molecular Genetics, University of Toronto

Tel: 416-673-8514

Cell: 416-817-8240

@.***

E**xecutive Assistant

Michelle Xin

Tel: 647-260-7927

@.*** @.**>

Ontario Institute for Cancer Research

MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3

@OICR_news https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Foicr_news&data=04%7C01%7CMichelle.Xin%40oicr.on.ca%7C9fa8636ff38b4a60ff5a08d926dd2113%7C9df949f8a6eb419d9caa1f8c83db674f%7C0%7C0%7C637583553462287559%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=PS9KzggzFoecbbt%2BZQyhkWkQo9D0hHiiujsbP7Idv4s%3D&reserved=0 | www.oicr.on.ca

Collaborate. Translate. Change lives.

This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.

lstein avatar Jan 02 '23 17:01 lstein

Sorry to only come back to this issue now.

@FurkanGozukara, it would be super nice if you could open a separate issue for this or not comment here as this is not related to the issue at all. Thanks!

I think in a first step what we should/could do here is to wrap the conversion script: https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py into a function and put it under src/diffusers so that it starts to become versioned, cc @anton-l @patil-suraj what do you think here?

Regarding the load_ckpt function, this would be a bigger project as providing such a class method involves a lot of API decisions:

  • Do we allow loading directly from the Hub -> this will need some specific loading functionality then and open the door for bigger mismatches with from_pretrained. If we don't people will ask for it soon.
  • Do we force a config.yaml to be provided. If yes, then people complain that they don't have a config.yaml. If no, we'll get tons of issue because the format wasn't automagically guessed right.
  • How to we maintain this function? There is simply no common CKPT format. Some CKPTs out there include EMA weights, some don't, some are wrapped into a "state-dict" key, some aren't, some include the VAE weights some don't, ...

To avoid such headaches in the beginning, I'd advocate to just create a nice conversion function and put it in src/diffusers that won't however be part of the public API.

@pcuenca @anton-l @patil-suraj @keturn @lstein wdyt, would this suffice for now?

patrickvonplaten avatar Jan 05 '23 22:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 30 '23 15:01 github-actions[bot]