diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

SDXL Fooocus Inpaint

Open WaterKnight1998 opened this issue 1 year ago • 50 comments

Is your feature request related to a problem? Please describe. I have seen that diffusers StableDiffusionXLInpaintPipeline generates worse results than SD 1.5 pipeline.

Describe the solution you'd like. Include Fooocus inpaint patch, you could specify with a new loader. Weights are available right now in hub. https://huggingface.co/lllyasviel/fooocus_inpaint

WaterKnight1998 avatar Jan 11 '24 10:01 WaterKnight1998

they also seem to use fooocus_inpaint_head.pth I'm not quite sure what this will do, I read the code and maybe an additional patch for unet? image

The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model. There is no doubt that fooocus has the best inpainting effect and diffusers has the fastest speed, it would be perfect if they could be combined.

Laidawang avatar Jan 12 '24 05:01 Laidawang

Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.

They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.

You can read about it here: https://github.com/lllyasviel/Fooocus/discussions/414

asomoza avatar Jan 12 '24 06:01 asomoza

The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model. There is no doubt that fooocus has the best inpainting effect and diffusers has the fastest speed, it would be perfect if they could be combined.

I was reading the code and they download the model here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/config.py#L398-L399

This function is called here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/async_worker.py#L301 You can see inpaint_patch_model_path is passed to base_model_additional_loras. They have an strange coded for applying the lora.

After model is loaded you can see in following tabs that they apply the head in top of the result of applying the lora

WaterKnight1998 avatar Jan 12 '24 09:01 WaterKnight1998

Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.实际上,它看起来更像是一个控制网,更像是这个:https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl。

They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.他们还使用自定义采样器进行修复,但我同意,如果能够在扩散器中使用它们那就太好了。

You can read about it here: lllyasviel/Fooocus#414您可以在这里阅读:lllyasviel/Fooocus#414

I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same. COMFY: https://github.com/comfyanonymous/ComfyUI/blob/53c8a99e6c00b5e20425100f6680cd9ea2652218/comfy/lora.py#L13 FOOOCUS: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/ldm_patched/modules/lora.py#L13

This can also be confirmed from the code provided by @WaterKnight1998. it just defined different names to ensure that only fooocus can load it correctly.

Laidawang avatar Jan 12 '24 09:01 Laidawang

Yup, that's the problem I saw. I had a difficult time trying to load in diffusers I didn't managed to map keys of layers into diffusers expected format :(

WaterKnight1998 avatar Jan 12 '24 09:01 WaterKnight1998

https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.

Laidawang avatar Jan 12 '24 09:01 Laidawang

Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.实际上,它看起来更像是一个控制网,更像是这个:https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl。 They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.他们还使用自定义采样器进行修复,但我同意,如果能够在扩散器中使用它们那就太好了。 You can read about it here: lllyasviel/Fooocus#414您可以在这里阅读:lllyasviel/Fooocus#414

I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same. COMFY: https://github.com/comfyanonymous/ComfyUI/blob/53c8a99e6c00b5e20425100f6680cd9ea2652218/comfy/lora.py#L13 FOOOCUS: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/ldm_patched/modules/lora.py#L13

Ok, both codes are the same. Is it possible to load ComfyUI weights in diffusers?

WaterKnight1998 avatar Jan 12 '24 09:01 WaterKnight1998

https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.

But the code is just updating the first conv, no?

WaterKnight1998 avatar Jan 12 '24 09:01 WaterKnight1998

https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.

But the code is just updating the first conv, no?

You are right, but we also need to use it in diffusers as input to start with

Laidawang avatar Jan 12 '24 09:01 Laidawang

Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers?

Laidawang avatar Jan 12 '24 09:01 Laidawang

But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely.

Laidawang avatar Jan 12 '24 09:01 Laidawang

But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely.

What do you mean with this?

WaterKnight1998 avatar Jan 12 '24 09:01 WaterKnight1998

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

Laidawang avatar Jan 12 '24 10:01 Laidawang

yeah I saw it afterwards, they switched to a custom model for inpainting, how good is the inpainting? can any of you post an example? if its really good maybe I can try or even better, someone from the diffusers team, but they'll probably need solid proof to work on it.

asomoza avatar Jan 12 '24 15:01 asomoza

before: 6341705296165_ pic after: 6121704958599_ pic_hd I tried outpainting and it was amazingly realistic. image for inpainting it, it blends well with the background.

Laidawang avatar Jan 15 '24 05:01 Laidawang

Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers?

I tested this today, after export I am not able to load with this:

from diffusers import AutoPipelineForInpainting, StableDiffusionXLInpaintPipeline,StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler, AutoencoderKL
import torch
from diffusers.utils import load_image, make_image_grid
pipeline = StableDiffusionXLInpaintPipeline.from_single_file("https://huggingface.co/WaterKnight/fooocus-inpaint/blob/main/fooocus_inpaint_unet.safetensors", torch_dtype=torch.float16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(33)

Error:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    115 if check_use_auth_token:
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/loaders/single_file.py:263, in FromSingleFileMixin.from_single_file(cls, pretrained_model_link_or_path, **kwargs)
    249         file_path = file_path[len("main/") :]
    251     pretrained_model_link_or_path = hf_hub_download(
    252         repo_id,
    253         filename=file_path,
   (...)
    260         force_download=force_download,
    261     )
--> 263 pipe = download_from_original_stable_diffusion_ckpt(
    264     pretrained_model_link_or_path,
    265     pipeline_class=cls,
    266     model_type=model_type,
    267     stable_unclip=stable_unclip,
    268     controlnet=controlnet,
    269     adapter=adapter,
    270     from_safetensors=from_safetensors,
    271     extract_ema=extract_ema,
    272     image_size=image_size,
    273     scheduler_type=scheduler_type,
    274     num_in_channels=num_in_channels,
    275     upcast_attention=upcast_attention,
    276     load_safety_checker=load_safety_checker,
    277     prediction_type=prediction_type,
    278     text_encoder=text_encoder,
    279     text_encoder_2=text_encoder_2,
    280     vae=vae,
    281     tokenizer=tokenizer,
    282     tokenizer_2=tokenizer_2,
    283     original_config_file=original_config_file,
    284     config_files=config_files,
    285     local_files_only=local_files_only,
    286 )
    288 if torch_dtype is not None:
    289     pipe.to(dtype=torch_dtype)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:1445, in download_from_original_stable_diffusion_ckpt(checkpoint_path_or_dict, original_config_file, image_size, prediction_type, model_type, extract_ema, scheduler_type, num_in_channels, upcast_attention, device, from_safetensors, stable_unclip, stable_unclip_prior, clip_stats_path, controlnet, adapter, load_safety_checker, pipeline_class, local_files_only, vae_path, vae, text_encoder, text_encoder_2, tokenizer, tokenizer_2, config_files)
   1442 unet_config["upcast_attention"] = upcast_attention
   1444 path = checkpoint_path_or_dict if isinstance(checkpoint_path_or_dict, str) else ""
-> 1445 converted_unet_checkpoint = convert_ldm_unet_checkpoint(
   1446     checkpoint, unet_config, path=path, extract_ema=extract_ema
   1447 )
   1449 ctx = init_empty_weights if is_accelerate_available() else nullcontext
   1450 with ctx():

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:426, in convert_ldm_unet_checkpoint(checkpoint, config, path, extract_ema, controlnet, skip_extract_state_dict)
    422                 unet_state_dict[key.replace(unet_key, "")] = checkpoint.pop(key)
    424 new_checkpoint = {}
--> 426 new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
    427 new_checkpoint["time_embedding.linear_1.bias"] = unet_state_dict["time_embed.0.bias"]
    428 new_checkpoint["time_embedding.linear_2.weight"] = unet_state_dict["time_embed.2.weight"]

WaterKnight1998 avatar Jan 15 '24 17:01 WaterKnight1998

There seems to be something wrong with the size of the weights. If you only saved unet, you cannot load it through from_single_file.

Laidawang avatar Jan 23 '24 03:01 Laidawang

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.

But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

lawsonxwl avatar Feb 02 '24 16:02 lawsonxwl

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?

lawsonxwl avatar Feb 02 '24 16:02 lawsonxwl

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.

But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

Could you share this, please?

WaterKnight1998 avatar Feb 05 '24 15:02 WaterKnight1998

Interesting! I'm keeping my eyes on this :) do share your results and findings with us

yiyixuxu avatar Feb 05 '24 19:02 yiyixuxu

@lawsonxwl any news???

WaterKnight1998 avatar Feb 08 '24 13:02 WaterKnight1998

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.例如,在fooocus inpainting中,假设进行30步采样,前15步将使用xl_base_model + inpainting_model,后15步将xl_base_model切换为单独推理。 https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 请参阅此处。

A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?一个问题,为什么你认为 inpaint patch 只在前 50% 的采样中使用?

I have read the code and I'm sure of this, and also when generating, it will also have a print in the console.

Laidawang avatar Feb 18 '24 09:02 Laidawang

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task. But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

Could you share this, please?

sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming. After several rounds of optimization, the quality of my pipeline result can be quite close to fooocus webui(In my personal view).

lawsonxwl avatar Feb 18 '24 09:02 lawsonxwl

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.例如,在fooocus inpainting中,假设进行30步采样,前15步将使用xl_base_model + inpainting_model,后15步将xl_base_model切换为单独推理。 https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 请参阅此处。

A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?一个问题,为什么你认为 inpaint patch 只在前 50% 的采样中使用?

I have read the code and I'm sure of this, and also when generating, it will also have a print in the console.

Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this

lawsonxwl avatar Feb 18 '24 10:02 lawsonxwl

Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this

laidawang233

Laidawang avatar Feb 18 '24 10:02 Laidawang

if you want another resource to look at:

https://github.com/Acly/comfyui-inpaint-nodes

Adds two nodes which allow using Fooocus inpaint model. It's a small and flexible patch which can be applied to any SDXL checkpoint and will transform it into an inpaint model. This model can then be used like other inpaint models, and provides the same benefits.

it also has other cool stuff for inpainting, I will try them too and I think that combined with this: #7038 the inpainting would be really good now.

asomoza avatar Feb 20 '24 12:02 asomoza

@asomoza keep us updated!

yiyixuxu avatar Feb 22 '24 02:02 yiyixuxu

For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.

I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task. But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??

Could you share this, please?

sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming. After several rounds of optimization, the quality of my pipeline result can be quite close to fooocus webui(In my personal view).

@lawsonxwl @WaterKnight1998 @yiyixuxu

Hi, So we at Dashtoon are also working on our custom diffusers pipeline to get best out of inpainting using sdxl inpaint. I also have been going through the fooocus codebase to merge fooocus's inpaint patch model to hf diffusers unet layers. So far, I also have managed to include inpaint head module to unet and merge the inpaint patch model layers to hf unet layers, by matching keys as @lawsonxwl also mentioned. And yes, it is quite overwhelming to navigate fooocus codebase..!

One thing to note down is that it is not exactly lora. It basically replaces original pretrained weight tensor (lets say w_orig) of unet for a given key (from a set of keys for which the weight needs to be updated) with the new weight tensor w_new. Now this w_new is calculated using three weight tensors w1, w_max, w_min. These three weights tensors you get from that inpaint patch model dict (fooocus_v26.inpaint.patch), where the key is the unet key (to be mapped to diffusers unet) and value is a tuple of those three tensors. So, w_new becomes w_old + (w1/255.0)*(w_max - w_min) + w_min. If w_old is of shape (320, 320, 3, 3), then w1 will be of same shape as w_old and w_max and w_min will both be of shapes (320, 1, 3, 3) which makes sense as I believe, it really is a shifting and scaling operation as done in above formula.

But the problem is when I tested using default params of sdxl inpaint pipeline with just inpaint head, I am getting something like this in the generated result (First is input image, 2nd is mask, 3rd is generated image using default sdxl inpaint pipeline without fooocus inpaint head, 4th is using fooocus inpaint head): output_ihead

Also, if I use just the inpaint patch model, I am currently getting something like below: output_patch

Prompt used in both the cases for inpainting was "Young Female, Blue Eyes, Brown Long Hair"

I havent implemented any other change from fooocus yet.

@lawsonxwl any idea as to why this might be happening for both the cases? Especially when using the fooocus inpaint patch model. What could I possibly be missing?

quark-toon avatar Mar 08 '24 16:03 quark-toon

@quark-toon I believe you forgot to disable passing extra inpaint_features to Unet after you've unloaded the Fooocus lora/patch. Also make sure you add the inpaint_features right after the conv_in

you can also message me in Telegram at bonlime if you want to debug this together

bonlime avatar Mar 08 '24 19:03 bonlime