diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Can't load flux-fill-lora with FluxControl

Open hardikdava opened this issue 5 months ago • 6 comments
trafficstars

Describe the bug

I am trying to load lora model trained with flux-fill pipeline using FluxControlInpaintPipeline. But it is not able to load the lora model into transformers. Any advice is appreciated. I want to have flux fill pipeline with control.

Reproduction

Download sample flux fill lora model

wget https://huggingface.co/WensongSong/Insert-Anything/resolve/main/20250321_steps5000_pytorch_lora_weights.safetensors

Script:

import os

import torch
from pipeline import FluxControlInpaintPipeline
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np


pipe = FluxControlInpaintPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev",
    torch_dtype=torch.bfloat16,
)
# ---------------------------------------------------------------
pipe.to("cuda")

pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")
pipe.load_lora_weights("20250321_steps5000_pytorch_lora_weights.safetensors")

prompt = "a blue robot singing opera with human-like expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
redux_img = load_image("bottom_flatlay.jpg")


head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")

output = pipe(
    prompt=prompt,
    image=image,
    control_image=control_image,
    mask_image=mask_image,
    num_inference_steps=30,
    strength=0.9,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")


Logs

Loading pipeline components...:  29%|█████████████████████████████████████████████████████████████████▋                                                                                                                                                                    | 2/7 [00:00<00:00, 17.92it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 58.60it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 89.39it/s]
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00,  9.36it/s]
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:168: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
  warnings.warn(
/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:837: UserWarning: Adapter default_1 was active which is now deleted. Setting active adapter to default_0.
  warnings.warn(
Loading default_1 was unsucessful with the following error: 
Error(s) in loading state_dict for FluxTransformer2DModel:
        size mismatch for x_embedder.lora_A.default_1.weight: copying a param with shape torch.Size([256, 384]) from checkpoint, the shape in current model is torch.Size([256, 128]).
Traceback (most recent call last):
  File "/workspace/flux-cluster/main.py", line 22, in <module>
    pipe.load_lora_weights("20250321_steps5000_pytorch_lora_weights.safetensors")
  File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 1853, in load_lora_weights
    self.load_lora_into_transformer(
  File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 1944, in load_lora_into_transformer
    transformer.load_lora_adapter(
  File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/peft.py", line 352, in load_lora_adapter
    incompatible_keys = set_peft_model_state_dict(self, state_dict, adapter_name, **peft_kwargs)
  File "/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 443, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False, assign=True)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
    raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for FluxTransformer2DModel:
        size mismatch for x_embedder.lora_A.default_1.weight: copying a param with shape torch.Size([256, 384]) from checkpoint, the shape in current model is torch.Size([256, 128]).

System Info

I am using following packages with

torch
torchvision
diffusers
transformers
accelerate==0.33.0
sentencepiece==0.2.0
protobuf==5.27.3
numpy<2
deepspeed==0.14.4
einops==0.8.0
huggingface-hub
pandas
opencv-python==4.10.0.84
supervision
cog
git+https://github.com/huggingface/peft.git
pillow
requests
loguru
python-dotenv
controlnet-aux
xformers

Who can help?

No response

hardikdava avatar Jun 03 '25 14:06 hardikdava

Hi @hardikdava The Flux Control Pipelines are meant to be used with Flux Control Models. Can you try changing your pipeline to the FluxFillPipeline.

DN6 avatar Jun 06 '25 05:06 DN6

@DN6 I tried but since my lora is trained with flux-fill pipeline, it is not directly compatible with models e.g. flux-dev, canny-dev or depth-dev. I came to know that flux-fill-dev transformers model has 16 input channel and others have 48. That's why we can not use flux-fill-dev with other model or even lora has to be trained separately.

hardikdava avatar Jun 06 '25 06:06 hardikdava

Is there an example of Flux Fill working with Control LoRAs that you are trying to replicate?

Off the top of my head I'm not sure how well this approach will work since both Flux Fill and Flux Control concatenate the conditioning masks/depth map along the channel dimension. Usually there isn't too much benefit from combining these conditionings since inpainting uses the surrounding image as context.

DN6 avatar Jun 06 '25 08:06 DN6

@DN6 Currently, there is no Control LoRAs which supports Flux fill unfortunately. I have a working solution which works quite well for my application. But I want to add controlnet to it to get conditioning based output.

hardikdava avatar Jun 06 '25 10:06 hardikdava

I don't think a Control LoRA would work here. We could look into adding a ControlNet support for Flux Fill. cc'ing: @asomoza to get his thoughts on this.

DN6 avatar Jun 12 '25 17:06 DN6

I think adding support for controlnet for Flux Fill is worth it, I've seen people that uses the Flux Fill model for everything so they don't need to swap models because they're big. Also using controlnet with inpainting/outpainting is very common and powerful.

asomoza avatar Jun 12 '25 17:06 asomoza