lora icon indicating copy to clipboard operation
lora copied to clipboard

missing metadata when converting safetensors

Open ssube opened this issue 3 years ago • 5 comments

I'm trying to use this repo to merge a bunch of LoRA weights into their base models, as the first step in a long and grueling conversion to ONNX. It's working for some files, but failing on many of the .safetensors that I try and complaining about a lack of metadata for the weights.

The example in the diffusers docs, https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4 works, works just fine and produces a merged model directory. I've been grabbing files tagged with LoRA from Civitai for testing, and https://civitai.com/models/8039/jackscape-samurai-jack-background-style-lora is a smaller one that fails. I checked out develop and installed it in a venv, but there's no difference between the latest develop and the last release.

(venv) ssube@compute-infer-1:~/lora$ git rev-parse HEAD
71c8c1dba595d77d0eabdf9c278630168e5a8ce1
(venv) ssube@compute-infer-1:~/lora$ wget https://civitai.com/api/download/models/9482                                                                                              
(venv) ssube@compute-infer-1:~/lora$ mv 9482 jack.safetensors
(venv) ssube@compute-infer-1:~/lora$ lora_add runwayml/stable-diffusion-v1-5 ./jack.safetensors ~/onnx-web/models/diffusion-sd-v1-5-jack  0.8 --mode upl
Lora Add, mode upl
Merging UNET/CLIP from runwayml/stable-diffusion-v1-5 with LoRA from ./jack.safetensors to /home/ssube/onnx-web/models/diffusion-sd-v1-5-jack. Merging ratio : 0.8.
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 53227.21it/s]
Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to inst
all `accelerate` for faster and less memory-intense model loading. You can do so with: 

pip install accelerate

.
/home/ssube/lora/venv/lib/python3.10/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be r
emoved in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
Traceback (most recent call last):
  File "/home/ssube/lora/venv/bin/lora_add", line 33, in <module>
    sys.exit(load_entry_point('lora-diffusion', 'console_scripts', 'lora_add')())
  File "/home/ssube/lora/lora_diffusion/cli_lora_add.py", line 201, in main
    fire.Fire(add)
  File "/home/ssube/lora/venv/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ssube/lora/venv/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ssube/lora/venv/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/ssube/lora/lora_diffusion/cli_lora_add.py", line 133, in add
    patch_pipe(loaded_pipeline, path_2)
  File "/home/ssube/lora/lora_diffusion/lora.py", line 1012, in patch_pipe
    monkeypatch_or_replace_safeloras(pipe, safeloras)
  File "/home/ssube/lora/lora_diffusion/lora.py", line 800, in monkeypatch_or_replace_safeloras
    loras = parse_safeloras(safeloras)
  File "/home/ssube/lora/lora_diffusion/lora.py", line 565, in parse_safeloras
    raise ValueError(
ValueError: Tensor lora_te_text_model_encoder_layers_0_mlp_fc1.alpha has no metadata - is this a Lora safetensor?

The file is a valid safetensor, I can load and inspect it with the library, but it doesn't have a whole lot of metadata:

>>> t = safetensors.safe_open("./jack.safetensors", framework="pt")
>>> t.metadata()
{
  'ss_batch_size_per_device': '1', 
  'ss_bucket_info': 'null', 
  'ss_cache_latents': 'True', 
  'ss_clip_skip': '2', 
  'ss_color_aug': 'False', 
  'ss_dataset_dirs': '{
    "100_jackscape": {
      "n_repeats": 100, 
      "img_count": 15
    }
  }', 
  'ss_enable_bucket': 'False', 
  'ss_epoch': '1', 
  'ss_flip_aug': 'False', 
  'ss_full_fp16': 'False', 
  'ss_gradient_accumulation_steps': '1', 
  'ss_gradient_checkpointing': 'True', 
  'ss_keep_tokens': 'None', 
  'ss_learning_rate': '0.0001', 
  'ss_lr_scheduler': 'constant', 
  'ss_lr_warmup_steps': '0', 
  'ss_max_bucket_reso': 'None', 
  'ss_max_token_length': 'None', 
  'ss_max_train_steps': '1500', 
  'ss_min_bucket_reso': 'None', 
  'ss_mixed_precision': 'fp16', 
  'ss_network_alpha': '8.0', 
  'ss_network_dim': '8', 
  'ss_network_module': 'networks.lora', 
  'ss_num_batches_per_epoch': '1500', 
  'ss_num_epochs': '1', 
  'ss_num_reg_images': '0', 
  'ss_num_train_images': '1500', 
  'ss_output_name': 'jackscape', 
  'ss_random_crop': 'False', 
  'ss_reg_dataset_dirs': '{}', 
  'ss_resolution': '(512, 512)', 
  'ss_sd_model_name': 'runwayml/stable-diffusion-v1-5', 
  'ss_seed': '1234', 
  'ss_session_id': '1978538326', 
  'ss_shuffle_caption': 'False', 
  'ss_tag_frequency': '{"100_jackscape": {"a castle tower surrounded by evergreen trees": 1, " in the style of [name]": 15, "the illustration shows a castle and water way surrounded by a mountain": 1, "an animation photo of a person on a bridge": 1, "an illustration of a beach on a rocky coast": 1, "a mountain scene with a smoke stack rising": 1, "a painting shows some mountains and birds flying over them": 1, "red city at dusk with an alien looking clock tower": 1, "a painting of a stream": 1, " with trees and bushes in the background": 1, "this is an illustration of there pink flowers in the park ": 1, "a cartoon - looking city stands next to the window of an old building": 1, "an illustration of a castle on a starry night": 1, "a picture of a lush green country side with water": 1, "illustration of river in a mountainous land with mountains in background": 1, "a large": 1, " grassy area with very tall green plants": 1, "an image of the forest scene at sunset": 1}}', 
  'ss_text_encoder_lr': '5e-05', 
  'ss_total_batch_size': '1', 
  'ss_training_comment': 'None', 
  'ss_training_started_at': '1676123583.4217496', 
  'ss_unet_lr': '0.0001', 
  'ss_v2': 'False', 
  'sshs_legacy_hash': '8d986d85', 
  'sshs_model_hash': '17f43461a751adda7c56082c2c94a9d416568fce3e55c954db6b3f01985e34ab'
}

This looks similar to #141, which was closed by OP with a link to https://huggingface.co/YoungMasterFromSect/Trauter_LoRAs/discussions/3#63c69d6a02d8c96233359025, but that doesn't offer much more detail.

Am I missing something, or are the tensor files?

ssube avatar Feb 16 '23 14:02 ssube

This was made using https://github.com/kohya-ss/sd-scripts and the keys in the file will probably be different.

rockerBOO avatar Feb 17 '23 17:02 rockerBOO

Good to know, thanks. I was aware of that repo, but didn't think to retry the failures there.

Are there some known/significant keys that can safely be used to programmatically tell which scripts produced a particular model?

ssube avatar Feb 18 '23 04:02 ssube

@ssube That would be awesome. We need something similar to an API specification that outlines the various properties of the outputs.

jndietz avatar Feb 19 '23 17:02 jndietz

I suspect the difference is lora_ vs ss_ on the keys, so I'm working on some code to check for that.

I can't promise this is all correct, but most of what I've found so far is documented in https://github.com/ssube/onnx-web/blob/main/docs/converting-models.md

ssube avatar Feb 19 '23 18:02 ssube

can we use monkey patch on lora trained by sd-scripts?

leondelee avatar Feb 21 '23 15:02 leondelee

still having this issue. Please fix this or help us use the Lora trained with kohya-ss GUI with diffusers

I don't want to install AUTOMATIC1111 webUI just to run Lora inference...

kopyl avatar Mar 28 '23 03:03 kopyl

@ssube

  1. it converts that Lora to a HUUUGE file which nobody needs. We need to use a small LoRA
  2. If i convert the output model to diffusers and run the inference with Diffusers library, it just gives me much worse results than using kohya-ss LoRA in AUTOMATIC1111 webUI

kopyl avatar Mar 28 '23 03:03 kopyl

I've made some progress and learned some things, but many of them are for ONNX models.

It's possible to load and blend the LoRA weights with the base model at runtime, in either PyTorch or ONNX format, as long as you have the correct node names: https://github.com/ssube/onnx-web/issues/213

The LoRAs produced by the sd-scripts have all of the necessary names, but the ones from this repo seem to use the index instead, f.ex https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/lora.py#L301 . That makes it a little bit more difficult to find the right nodes but the math is otherwise the same.

ssube avatar Mar 29 '23 04:03 ssube

@ssube were you able to use the LoRA at runtime with Python?

Please show an example ❤️

kopyl avatar Mar 30 '23 00:03 kopyl

hi I am also encountering the problem that can't use lora from civitai. My problem is the keys of loras do not match the keys used in this reop. Is there any way to convert civitai type lora to the type of lora which is supported by this repo?

JamesSand avatar Apr 16 '23 10:04 JamesSand

My original question has been answered, so I'm going to close this issue, but here's a quick dump of everything I learned along the way:

  • the LoRA files produced by this repo use the node index rather than name
    • as long as the model is in the same structure/format, you can walk through the nn modules and pop a LoRA weight from the list when you encounter the correct type of module
  • ones produced by sd-scripts and descendants use the node name
    • no count/pop, just walk through the model and match the names
  • for ONNX models, the names have been adjusted slightly and a lot of the weights end up in model.graph.initializer, named as onnx::MatMul with a numeric suffix
    • you can find the corresponding node in model.graph.node and use that node's name to resolve the correct initializers/weights
  • once you've located the weights, they can be adjusted using the normal maths
    • the math is different for each module/node type, but I'm not sure if it varies between ranks (haven't tested enough models yet)
    • for a nn.Linear and conventional LoRA, you want to recreate or simply call https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/lora.py#L53
    • for nn.Conv2D and conventional LoRA, that's https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/lora.py#L130
    • for LyCORIS, all of the formulas for recomposing weights are in https://github.com/KohakuBlueleaf/LyCORIS/blob/53119eb852b0450ab9294437d78dcdfa0d58a0dc/lycoris/utils.py#L368
    • this is still a little rough, but what I'm using: https://github.com/ssube/onnx-web/blob/main/api/onnx_web/convert/diffusion/lora.py#L76
  • you can merge many LoRAs and Textual Inversions into a single model, but it will eventually start to glitch out
    • faces break first, with white/blue/black spots appearing

Some example keys for this repo:

>>> m = safetensors.torch.load_file("lora_disney.safetensors")
>>> list(m.keys())[0:20]
[
  '<s1>', 
  '<s2>', 
  'text_encoder:0:down', 
  'text_encoder:0:up', 
  'text_encoder:10:down', 
  'text_encoder:10:up', 
  'text_encoder:11:down', 
  'text_encoder:11:up', 
  'text_encoder:12:down', 
  'text_encoder:12:up', 
  'text_encoder:13:down', 
  'text_encoder:13:up', 
  'text_encoder:14:down', 
  'text_encoder:14:up', 
  'text_encoder:15:down', 
  'text_encoder:15:up', 
  'text_encoder:16:down', 
  'text_encoder:16:up', 
  'text_encoder:17:down', 
  'text_encoder:17:up'
]

From another LoRA, not from this repo:

>>> m = safetensors.torch.load_file("/opt/onnx-web/models/lora/apollo-car.safetensors")
>>> list(m.keys())[0:20]
[
  'lora_te_text_model_encoder_layers_0_mlp_fc1.alpha',
  'lora_te_text_model_encoder_layers_0_mlp_fc1.lora_down.weight', 
  'lora_te_text_model_encoder_layers_0_mlp_fc1.lora_up.weight', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.alpha', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.lora_down.weight', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.lora_up.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.alpha', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_down.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.lora_up.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.alpha', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.lora_down.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.lora_up.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_q_proj.alpha', 
  'lora_te_text_model_encoder_layers_0_self_attn_q_proj.lora_down.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_q_proj.lora_up.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_v_proj.alpha', 
  'lora_te_text_model_encoder_layers_0_self_attn_v_proj.lora_down.weight', 
  'lora_te_text_model_encoder_layers_0_self_attn_v_proj.lora_up.weight', 
  'lora_te_text_model_encoder_layers_10_mlp_fc1.alpha', 
  'lora_te_text_model_encoder_layers_10_mlp_fc1.lora_down.weight'
]

And from a Hadamard-product LyCORIS, just for completeness:

[
  'lora_te_text_model_encoder_layers_0_mlp_fc1.alpha', 
  'lora_te_text_model_encoder_layers_0_mlp_fc1.hada_w1_a', 
  'lora_te_text_model_encoder_layers_0_mlp_fc1.hada_w1_b', 
  'lora_te_text_model_encoder_layers_0_mlp_fc1.hada_w2_a', 
  'lora_te_text_model_encoder_layers_0_mlp_fc1.hada_w2_b', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.alpha', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.hada_w1_a', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.hada_w1_b', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.hada_w2_a', 
  'lora_te_text_model_encoder_layers_0_mlp_fc2.hada_w2_b', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.alpha', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.hada_w1_a', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.hada_w1_b', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.hada_w2_a', 
  'lora_te_text_model_encoder_layers_0_self_attn_k_proj.hada_w2_b', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.alpha', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.hada_w1_a', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.hada_w1_b', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.hada_w2_a', 
  'lora_te_text_model_encoder_layers_0_self_attn_out_proj.hada_w2_b'
]

ssube avatar Apr 17 '23 13:04 ssube