diffusers
diffusers copied to clipboard
Can't load flux-fill-lora with FluxControl
Describe the bug
I am trying to load lora model trained with flux-fill pipeline using FluxControlInpaintPipeline. But it is not able to load the lora model into transformers. Any advice is appreciated. I want to have flux fill pipeline with control.
Reproduction
Download sample flux fill lora model
wget https://huggingface.co/WensongSong/Insert-Anything/resolve/main/20250321_steps5000_pytorch_lora_weights.safetensors
Script:
import os
import torch
from pipeline import FluxControlInpaintPipeline
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np
pipe = FluxControlInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
torch_dtype=torch.bfloat16,
)
# ---------------------------------------------------------------
pipe.to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")
pipe.load_lora_weights("20250321_steps5000_pytorch_lora_weights.safetensors")
prompt = "a blue robot singing opera with human-like expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
redux_img = load_image("bottom_flatlay.jpg")
head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")
output = pipe(
prompt=prompt,
image=image,
control_image=control_image,
mask_image=mask_image,
num_inference_steps=30,
strength=0.9,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")
Logs
Loading pipeline components...: 29%|█████████████████████████████████████████████████████████████████▋ | 2/7 [00:00<00:00, 17.92it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 58.60it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 89.39it/s]
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.36it/s]
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:168: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
warnings.warn(
/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:837: UserWarning: Adapter default_1 was active which is now deleted. Setting active adapter to default_0.
warnings.warn(
Loading default_1 was unsucessful with the following error:
Error(s) in loading state_dict for FluxTransformer2DModel:
size mismatch for x_embedder.lora_A.default_1.weight: copying a param with shape torch.Size([256, 384]) from checkpoint, the shape in current model is torch.Size([256, 128]).
Traceback (most recent call last):
File "/workspace/flux-cluster/main.py", line 22, in <module>
pipe.load_lora_weights("20250321_steps5000_pytorch_lora_weights.safetensors")
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 1853, in load_lora_weights
self.load_lora_into_transformer(
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 1944, in load_lora_into_transformer
transformer.load_lora_adapter(
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/peft.py", line 352, in load_lora_adapter
incompatible_keys = set_peft_model_state_dict(self, state_dict, adapter_name, **peft_kwargs)
File "/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 443, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False, assign=True)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for FluxTransformer2DModel:
size mismatch for x_embedder.lora_A.default_1.weight: copying a param with shape torch.Size([256, 384]) from checkpoint, the shape in current model is torch.Size([256, 128]).
System Info
I am using following packages with
torch
torchvision
diffusers
transformers
accelerate==0.33.0
sentencepiece==0.2.0
protobuf==5.27.3
numpy<2
deepspeed==0.14.4
einops==0.8.0
huggingface-hub
pandas
opencv-python==4.10.0.84
supervision
cog
git+https://github.com/huggingface/peft.git
pillow
requests
loguru
python-dotenv
controlnet-aux
xformers
Who can help?
No response
Hi @hardikdava The Flux Control Pipelines are meant to be used with Flux Control Models. Can you try changing your pipeline to the FluxFillPipeline.
@DN6 I tried but since my lora is trained with flux-fill pipeline, it is not directly compatible with models e.g. flux-dev, canny-dev or depth-dev. I came to know that flux-fill-dev transformers model has 16 input channel and others have 48. That's why we can not use flux-fill-dev with other model or even lora has to be trained separately.
Is there an example of Flux Fill working with Control LoRAs that you are trying to replicate?
Off the top of my head I'm not sure how well this approach will work since both Flux Fill and Flux Control concatenate the conditioning masks/depth map along the channel dimension. Usually there isn't too much benefit from combining these conditionings since inpainting uses the surrounding image as context.
@DN6 Currently, there is no Control LoRAs which supports Flux fill unfortunately. I have a working solution which works quite well for my application. But I want to add controlnet to it to get conditioning based output.
I don't think a Control LoRA would work here. We could look into adding a ControlNet support for Flux Fill. cc'ing: @asomoza to get his thoughts on this.
I think adding support for controlnet for Flux Fill is worth it, I've seen people that uses the Flux Fill model for everything so they don't need to swap models because they're big. Also using controlnet with inpainting/outpainting is very common and powerful.