InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

Initial setup of lora support

Open felorhik opened this issue 2 years ago • 11 comments

I wanted to get Lora support going, so I decided to take a crack at it.

It uses the same syntax as https://github.com/AUTOMATIC1111/stable-diffusion-webui like so:

prompt: main prompt, <lora:name_of_lora:1>

Right now the weight does nothing, it will always be 1. Also the lora must be trained in diffusers format https://huggingface.co/docs/diffusers/training/lora

This also requires the base diffusers version to be raised from 0.11 to 0.13

it should be able to support multiple, but I have not tested it that deeply. This is far from ready, but might be useful for anyone wanting to experiment with lora in InvokeUI

felorhik avatar Feb 18 '23 12:02 felorhik

Turns out finding a LoRA model online that is in the diffusers format is ending up being harder than I expected. Most of them are shared as safetensors. I think I'll train my own Lora and save as diffusers and then try this PR out.

blessedcoolant avatar Feb 18 '23 18:02 blessedcoolant

Getting LoRA support going is great, but as you point out, the vast majority are in safetensors format -- might be better to aim to support that up front?

ChrGriffin avatar Feb 18 '23 18:02 ChrGriffin

Working on getting safetensors to work, however there is some complexity to it.

https://github.com/cloneofsimo/lora uses a format compatible with diffusers, and can be used to load safetensors based lora files, provided some patching is done on the pipeline. There are guides in the repo for that I have been experimenting with. https://github.com/cloneofsimo/lora/blob/71c8c1dba595d77d0eabdf9c278630168e5a8ce1/scripts/run_inference.ipynb

Most have been trained with https://github.com/kohya-ss/sd-scripts which uses its own format for the keys. I have been attempting to convert them and save in diffusers format, but no luck yet.

Given that, I am leaning more towards needing a conversion for the kohya scripts, rather then supporting them natively. Though knowing which method was used to train may be difficult. Still experimenting but overall diffusers can make a 3mb lora file, vs 150mb of the kohya method. I have not seen a difference in the quality either.

For training as it currently is:

Using the following diffusers training scripts I have had success loading them https://github.com/huggingface/diffusers/blob/b2c1e0d6d4ffbd93fc0c381e5b9cdf316ca4f99f/examples/dreambooth/train_dreambooth_lora.py https://github.com/huggingface/diffusers/blob/b2c1e0d6d4ffbd93fc0c381e5b9cdf316ca4f99f/examples/text_to_image/train_text_to_image_lora.py

felorhik avatar Feb 19 '23 03:02 felorhik

Is there a script somewhere that allows us to convert Lora models from safetensors or whatever format to diffusers? If there is one, we could integrate that and load through that maybe?

blessedcoolant avatar Feb 19 '23 03:02 blessedcoolant

Added an adjustment which should be able to load safetensors made by https://github.com/cloneofsimo/lora though still working on testing / debugging it.

pip install git+https://github.com/cloneofsimo/lora.git is needed as a dependency for it though.

felorhik avatar Feb 19 '23 04:02 felorhik

I have no idea if any of this information will be useful to you, but maybe it'll at least inspire some fresh Google terms!

I've been using the HuggingFace Diffusers package, specifically the LoRA Dreambooth example with basically zero changes, in order to train my LoRAs (the same one you linked). Unfortunately, the package produces a .bin file, not compatible with A1111 (changing the extension isn't enough). A little Googling did bring me to this thread with the script by ignacfetser at the bottom.

It's hardly the most... uh... structurally sound solution, but I can confirm that for now at least, it does work.

Of course, this isn't the exact issue you're encountering, but I figured I'd drop it here if any of the information was helpful.

ChrGriffin avatar Feb 19 '23 04:02 ChrGriffin

@ChrGriffin It did help, gave me a good direction on a conversion script.

It saves as diffusers, and can be loaded after running

python ./scripts/convert_lora.py --lora_file=path/to/lora_file_name.safetensors

Although i have not tested it yet, it should work with ckpt too

python ./scripts/convert_lora.py --lora_file=path/to/lora_file_name.ckpt

It should save to ./models/lora/lora_file_name and be usable by <lora:lora_file_name:1>

Not seeing great results out of it yet, but it is loading into the pipeline at least.

felorhik avatar Feb 19 '23 06:02 felorhik

If the conversion time is short, I think we can effectively do a one time conversion of the safetensor/ckpt model to diffusers. I think that might be more ideal because in the long run, we want to do Diffusers. And that way, we can avoid installing the original repo as a dependency and fully have it work through the Diffusers pipeline.

blessedcoolant avatar Feb 19 '23 06:02 blessedcoolant

https://github.com/huggingface/diffusers/pull/2403 may be of some interest--though, you've already figured out the key mapping part.

I tried a variant of that in place of the cloneofismo monkey patch with better results (no errors for missing alphas). Still not perfect, seems like either the math is off somewhere or this lora I am testing with is junk...

Using the conversion script does not work for me. Likely due to missing text model encoder layers (lora_te_text_model_encoder_layers). There's also nothing updating text encoding in LoraManager for diffusers.

Edit: Conversion times from .safetensors to diffusers is fast, so I see no QOL impacts of a one-time conversion.

neecapp avatar Feb 19 '23 06:02 neecapp

Okay. Managed to load a diffusers version of the Lora model but it doesn't seem to be working.

blessedcoolant avatar Feb 19 '23 06:02 blessedcoolant

re: the prompt syntax. is this the way LORA's are going to be activated, as a prompt term? if so, i'd suggest a more explicit syntax in-line with the rest of Invoke. something like withLora(lora_name [,optional weight]) so eg a cat running in the forest withLora(tiger, 0.5) to apply the tiger.lora or whatever model at 50% strength

@jordanramstad where is the prompt parser logic happening in your code? i didn't see it somewhere obvious.

damian0815 avatar Feb 19 '23 09:02 damian0815

I believe the reasoning for the syntax is that it mimics A1111's syntax for loading LoRAs, making transitioning back and forth easier.

ChrGriffin avatar Feb 19 '23 22:02 ChrGriffin

success!

@neecapp the PR you linked really helped.

Now supports safetensors made in other formats, it will load and merge into the current model when it runs, rather then try to convert or force it to load in diffusers format.

Converting may still be done, but this allows for the text encoder to be supported as well.

EDIT: got a little excited with it working, it will re-apply the weights on each execution without clearing, leading to gradual burning. Working on resetting it after each run to get around that.

felorhik avatar Feb 20 '23 00:02 felorhik

I'll try to look at it again if I have some time, but it looks like you'll load the lora repeatedly for every successive run, which you likely do not want.

On mobile at the moment, but it may be better to update hijack text_encode.forward and unet.forward to apply weights in a callback to a set of functions controlled by lora_manager, so that the prompt multiplier can be dynamically changed and the lora is only loaded at-most-once (depending on prompts).

neecapp avatar Feb 20 '23 01:02 neecapp

Latest commit will break what was working before.

I have started to adjust it to apply layers with the weight data, but running into issues with matching tensors. I don't think ill solve it tonight, so I have committed it if anyone else wants to take a look.

felorhik avatar Feb 20 '23 09:02 felorhik

Working much better, much more consistently thank you @neecapp Still has some burning on the face at times, and some general issues in the detail on the face, but loading in fairly seamless.

I also moved the load a bit further up to where the prompt is being added, to keep them together since it seems fine to load it there.

felorhik avatar Feb 21 '23 04:02 felorhik

Working much better, much more consistently thank you @neecapp Still has some burning on the face at times, and some general issues in the detail on the face, but loading in fairly seamless.

I also moved the load a bit further up to where the prompt is being added, to keep them together since it seems fine to load it there.

Have you tested against A1111 to see if the same "burn-in" happens there? I've tried quite a few running over and over on the web ui without changing models and haven't had that issue since the hook rewrite.

neecapp avatar Feb 21 '23 06:02 neecapp

@neecapp Yes, been testing back and forth but I think I figured it out in my last commit.

The forward must be first, so I added an override to always force it to run first, I seen it used in A1111 and that is likely the reason for the difference.

felorhik avatar Feb 21 '23 06:02 felorhik

@neecapp Yes, been testing back and forth but I think I figured it out in my last commit.

The forward must be first, so I added an override to always force it to run first, I seen it used in A1111 and that is likely the reason for the difference.

The first thing the new version does is call the original forward, which should be the same behavior as registering a hook. The output from the forward call is passed into the hook.

Would have to find another way to discuss, but glad you fixed whatever the problem was.

neecapp avatar Feb 21 '23 07:02 neecapp

Shifted things around @damian0815 I started looking into adding the lora cross attention processor, but adjusting the keys and getting it to properly import the state dictionary was proving to be a challenge.

I have left the code in, just disabled. Also refactored a bit to make it easier to change how it loads in lora.

felorhik avatar Feb 22 '23 02:02 felorhik

Shifted things around @damian0815 I started looking into adding the lora cross attention processor, but adjusting the keys and getting it to properly import the state dictionary was proving to be a challenge.

I have left the code in, just disabled. Also refactored a bit to make it easier to change how it loads in lora.

I had a version working with cross attention processing, but refactoring a bit as it initially only supported a single lora. Doing some testing on n-loras, but busy IRL. Will see if I can get some cycles unless someone else wants to chip in.

neecapp avatar Feb 22 '23 15:02 neecapp

Rewrote the hook version and fixed some issues, got that working very well on my end.

I have a version that loads cross attention and works fine, but without support for other modules it doesn't work as well. Could partially patch, but ran out of time for the day.

neecapp avatar Feb 23 '23 02:02 neecapp

Rewrote the hook version and fixed some issues, got that working very well on my end.

I have a version that loads cross attention and works fine, but without support for other modules it doesn't work as well. Could partially patch, but ran out of time for the day.

Sweet, ty for all your help as well. I have set you as a collaborator on my fork, so you should be able to push if you want. The current version does use the hooks, just moved things around to make it easier to adjust how it handles different conditions

felorhik avatar Feb 23 '23 02:02 felorhik

Never mind, looks like they want people to just use peft.

neecapp avatar Feb 23 '23 23:02 neecapp

@neecapp @damian0815

Added a peft setup, it does not work yet, but with a lora trained with peft (https://github.com/huggingface/peft) it will try to use it.

The issue is something related to sending to the right device, but the error just dumps out the entire model, making it hard to diagnose.

It should only take affect with withLora(lora_name,1) when the folder contains a lora_config.json and lora.pt file. Otherwise it will use diffusers. The config is a little different then standard peft, since the training they set puts the instance prompt infront of the file name, though I don't think that is necessary and makes it hard to scan the dir properly.

On another note, this makes 3 variation of Lora. diffusers, peft and "legacy" (kohya scripts). While support for all is nice, it does feel like we should focus on one, the code is getting kind of messy having support for the different types atm.

felorhik avatar Feb 26 '23 03:02 felorhik

I've been reading when I can. To be honest, as much as people may want the "legacy" variant to go away, that format covers 99.99% of all existing LoRA models. I don't think I've even seen a diffuser LoRA outside of engineer repos--with only unet support, they aren't very good. I doubt the Kohya variant is going anywhere anytime soon.

When the legacy format dominates and "just works" with A1111, most people will not think of it as legacy, they will see it as Invoke not having good LoRA support.

Haven't had much availability to think about a proper design for all of this, but there needs to be a good, low friction, way to support Kohya LoRA. In all reality, peft isn't a solution to the problem--at least not yet.

Probably an unpopular opinion, but I see supporting legacy as the immediate need with anything else as secondary, which can be done with follow-on PRs as PEFT et al. mature.

All of this is up to the Invoke team, of course. Apart from this PR, I'm not really active on here.

Edit: Source of the use peft for lora comment: https://github.com/huggingface/transformers/pull/21770

neecapp avatar Feb 26 '23 03:02 neecapp

That may be an unpopular opinion within the devs, but as a user, I'm here saying I don't care -- at all -- about supporting the diffuser format of LoRAs. Effectively none of the LoRAs available to "average users" are in the diffuser format. For example, take a quick browse of the LoRAs available to users on Civitai and you'll find that all of them are in Safetensor format. It's less easily searchable, but similarly, you'll find that most or all LoRAs on HuggingFace are in Safetensor format. And finally, Automatic1111, the de facto Stable Diffusion webui, uses Safetensor format LoRAs.

Maybe it's "legacy", but it's what the Stable Diffusion community uses, overwhelmingly. Choosing not to support Safetensor LoRAs is effectively choosing not to support LoRAs at all.

ChrGriffin avatar Feb 26 '23 04:02 ChrGriffin

With all the work that has already been done here, and all the knowledge gathered, what is the state of things? I am really looking forward to using Loras with Invoke, as personally, I much prefer the Invoke UI over A1111.

FWIW, I do agree with the general sentiment here, that supporting the de-facto standard format is lot more valuable, and efforts to support diffusers can be done in follow up changes.

simonfuhrmann avatar Mar 04 '23 18:03 simonfuhrmann

Just wanted to post an update.

With the talk of code freeze with the implementation of nodes. I have paused on doing much here.

There also appears to be another evolution of LoRA worth keeping an eye on https://github.com/KohakuBlueleaf/LyCORIS Right now there is various implementations, but the kohya method seems to be the standard, even LyCORIS is utilizing it as a base. I am going to keep an eye on things for the time being, and will make revisions here once things settle a little.

felorhik avatar Mar 09 '23 22:03 felorhik

Disappointing to see Invoke lagging so far behind other solutions in the SD space, but I do understand the perspective of waiting for a "settled" solution before implementing anything. I wonder if a solution for user-created extensions, like A1111 has, would ease over these issues. Then users could develop or install extensions for whichever LoRA implementation they use and the core Invoke codebase wouldn't be polluted with three or four different ways of loading LoRAs.

ChrGriffin avatar Mar 10 '23 00:03 ChrGriffin