SDXL Fooocus Inpaint
Is your feature request related to a problem? Please describe. I have seen that diffusers StableDiffusionXLInpaintPipeline generates worse results than SD 1.5 pipeline.
Describe the solution you'd like. Include Fooocus inpaint patch, you could specify with a new loader. Weights are available right now in hub. https://huggingface.co/lllyasviel/fooocus_inpaint
they also seem to use fooocus_inpaint_head.pth I'm not quite sure what this will do, I read the code and maybe an additional patch for unet?
The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model. There is no doubt that fooocus has the best inpainting effect and diffusers has the fastest speed, it would be perfect if they could be combined.
Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.
They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.
You can read about it here: https://github.com/lllyasviel/Fooocus/discussions/414
The inpaint_v26.fooocus.patch is more similar to a lora, and then the first 50% executes base_model + lora, and the last 50% executes base_model. There is no doubt that fooocus has the best inpainting effect and diffusers has the fastest speed, it would be perfect if they could be combined.
I was reading the code and they download the model here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/config.py#L398-L399
This function is called here: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/modules/async_worker.py#L301 You can see inpaint_patch_model_path is passed to base_model_additional_loras. They have an strange coded for applying the lora.
After model is loaded you can see in following tabs that they apply the head in top of the result of applying the lora
Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.实际上,它看起来更像是一个控制网,更像是这个:https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl。
They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.他们还使用自定义采样器进行修复,但我同意,如果能够在扩散器中使用它们那就太好了。
You can read about it here: lllyasviel/Fooocus#414您可以在这里阅读:lllyasviel/Fooocus#414
I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same. COMFY: https://github.com/comfyanonymous/ComfyUI/blob/53c8a99e6c00b5e20425100f6680cd9ea2652218/comfy/lora.py#L13 FOOOCUS: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/ldm_patched/modules/lora.py#L13
This can also be confirmed from the code provided by @WaterKnight1998. it just defined different names to ensure that only fooocus can load it correctly.
Yup, that's the problem I saw. I had a difficult time trying to load in diffusers I didn't managed to map keys of layers into diffusers expected format :(
https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.
Actually it seems more like a controlnet, something more like this one: https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl.实际上,它看起来更像是一个控制网,更像是这个:https://huggingface.co/destitech/controlnet-inpaint-dreamer-sdxl。 They also use a custom sampler for the inpainting, but I agree, it would be nice to be able to use those in diffusers.他们还使用自定义采样器进行修复,但我同意,如果能够在扩散器中使用它们那就太好了。 You can read about it here: lllyasviel/Fooocus#414您可以在这里阅读:lllyasviel/Fooocus#414
I have read the comparison between Fooocus and comfyui of loading lora. I think they are basically the same. COMFY: https://github.com/comfyanonymous/ComfyUI/blob/53c8a99e6c00b5e20425100f6680cd9ea2652218/comfy/lora.py#L13 FOOOCUS: https://github.com/lllyasviel/Fooocus/blob/dc5b5238c83c63b4d7814ba210da074ddc341213/ldm_patched/modules/lora.py#L13
Ok, both codes are the same. Is it possible to load ComfyUI weights in diffusers?
https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.
But the code is just updating the first conv, no?
https://github.com/lllyasviel/Fooocus/blob/main/modules/inpaint_worker.py#L187 Another thing worth considering is how to implement this patch for inpaint head model.
But the code is just updating the first conv, no?
You are right, but we also need to use it in diffusers as input to start with
Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers?
But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely.
But as I saw in fooocus, the base model will still be used in the second stage, so the most elegant way is to load and unload it freely.
What do you mean with this?
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.
yeah I saw it afterwards, they switched to a custom model for inpainting, how good is the inpainting? can any of you post an example? if its really good maybe I can try or even better, someone from the diffusers team, but they'll probably need solid proof to work on it.
before:
after:
I tried outpainting and it was amazingly realistic.
for inpainting it, it blends well with the background.
Maybe consider loading it in comfy and saving it as overall weights and then using it in diffusers?
I tested this today, after export I am not able to load with this:
from diffusers import AutoPipelineForInpainting, StableDiffusionXLInpaintPipeline,StableDiffusionInpaintPipeline, DPMSolverMultistepScheduler, AutoencoderKL
import torch
from diffusers.utils import load_image, make_image_grid
pipeline = StableDiffusionXLInpaintPipeline.from_single_file("https://huggingface.co/WaterKnight/fooocus-inpaint/blob/main/fooocus_inpaint_unet.safetensors", torch_dtype=torch.float16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(33)
Error:
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
115 if check_use_auth_token:
116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/loaders/single_file.py:263, in FromSingleFileMixin.from_single_file(cls, pretrained_model_link_or_path, **kwargs)
249 file_path = file_path[len("main/") :]
251 pretrained_model_link_or_path = hf_hub_download(
252 repo_id,
253 filename=file_path,
(...)
260 force_download=force_download,
261 )
--> 263 pipe = download_from_original_stable_diffusion_ckpt(
264 pretrained_model_link_or_path,
265 pipeline_class=cls,
266 model_type=model_type,
267 stable_unclip=stable_unclip,
268 controlnet=controlnet,
269 adapter=adapter,
270 from_safetensors=from_safetensors,
271 extract_ema=extract_ema,
272 image_size=image_size,
273 scheduler_type=scheduler_type,
274 num_in_channels=num_in_channels,
275 upcast_attention=upcast_attention,
276 load_safety_checker=load_safety_checker,
277 prediction_type=prediction_type,
278 text_encoder=text_encoder,
279 text_encoder_2=text_encoder_2,
280 vae=vae,
281 tokenizer=tokenizer,
282 tokenizer_2=tokenizer_2,
283 original_config_file=original_config_file,
284 config_files=config_files,
285 local_files_only=local_files_only,
286 )
288 if torch_dtype is not None:
289 pipe.to(dtype=torch_dtype)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:1445, in download_from_original_stable_diffusion_ckpt(checkpoint_path_or_dict, original_config_file, image_size, prediction_type, model_type, extract_ema, scheduler_type, num_in_channels, upcast_attention, device, from_safetensors, stable_unclip, stable_unclip_prior, clip_stats_path, controlnet, adapter, load_safety_checker, pipeline_class, local_files_only, vae_path, vae, text_encoder, text_encoder_2, tokenizer, tokenizer_2, config_files)
1442 unet_config["upcast_attention"] = upcast_attention
1444 path = checkpoint_path_or_dict if isinstance(checkpoint_path_or_dict, str) else ""
-> 1445 converted_unet_checkpoint = convert_ldm_unet_checkpoint(
1446 checkpoint, unet_config, path=path, extract_ema=extract_ema
1447 )
1449 ctx = init_empty_weights if is_accelerate_available() else nullcontext
1450 with ctx():
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py:426, in convert_ldm_unet_checkpoint(checkpoint, config, path, extract_ema, controlnet, skip_extract_state_dict)
422 unet_state_dict[key.replace(unet_key, "")] = checkpoint.pop(key)
424 new_checkpoint = {}
--> 426 new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"]
427 new_checkpoint["time_embedding.linear_1.bias"] = unet_state_dict["time_embed.0.bias"]
428 new_checkpoint["time_embedding.linear_2.weight"] = unet_state_dict["time_embed.2.weight"]
There seems to be something wrong with the size of the weights. If you only saved unet, you cannot load it through from_single_file.
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.
I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.
But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.
A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.
I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task.
But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??
Could you share this, please?
Interesting! I'm keeping my eyes on this :) do share your results and findings with us
@lawsonxwl any news???
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.例如,在fooocus inpainting中,假设进行30步采样,前15步将使用xl_base_model + inpainting_model,后15步将xl_base_model切换为单独推理。 https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 请参阅此处。
A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?一个问题,为什么你认为 inpaint patch 只在前 50% 的采样中使用?
I have read the code and I'm sure of this, and also when generating, it will also have a print in the console.
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.
I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task. But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??
Could you share this, please?
sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming. After several rounds of optimization, the quality of my pipeline result can be quite close to fooocus webui(In my personal view).
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.例如,在fooocus inpainting中,假设进行30步采样,前15步将使用xl_base_model + inpainting_model,后15步将xl_base_model切换为单独推理。 https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 请参阅此处。
A question, why do you think that the inpaint patch is only used in the first 50% of the sampling?一个问题,为什么你认为 inpaint patch 只在前 50% 的采样中使用?
I have read the code and I'm sure of this, and also when generating, it will also have a print in the console.
Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this
Yes, you are absolutely right. Do you mind leaving your wechat? we can talk about this
laidawang233
if you want another resource to look at:
https://github.com/Acly/comfyui-inpaint-nodes
Adds two nodes which allow using Fooocus inpaint model. It's a small and flexible patch which can be applied to any SDXL checkpoint and will transform it into an inpaint model. This model can then be used like other inpaint models, and provides the same benefits.
it also has other cool stuff for inpainting, I will try them too and I think that combined with this: #7038 the inpainting would be really good now.
@asomoza keep us updated!
For example, in fooocus inpainting, assuming that 30 steps of sampling are performed, xl_base_model + inpainting_model will be used in the first 15 steps, and xl_base_model will be switched to separate inference in the last 15 steps. https://github.com/lllyasviel/Fooocus/blob/main/modules/async_worker.py#L307 see here.
I have converted all loras and the inpaint patch in fooocus into diffusers style format by matching keys, inpaint head is also included, by using realisticstockphotov1.0 diffusers checkpoint on HF, it's ok to do the inpainting task. But comparing my result with fooocus gradio ui result, I found that my result's quality is worse than fooocus ui, it has less detail, I'm sure that I have removed almost all the tricks in fooocus, including prompt expansion, sharpness, ADM guidance... etc, also, the images and the masks used in both pipeline are the same... any advice??
Could you share this, please?
sorry, as is restricted by the regulation, I cannot share you the code. If you want to migrate fooocus to diffusers,you have to check almost all the code in fooocus project... really overwhelming. After several rounds of optimization, the quality of my pipeline result can be quite close to fooocus webui(In my personal view).
@lawsonxwl @WaterKnight1998 @yiyixuxu
Hi, So we at Dashtoon are also working on our custom diffusers pipeline to get best out of inpainting using sdxl inpaint. I also have been going through the fooocus codebase to merge fooocus's inpaint patch model to hf diffusers unet layers. So far, I also have managed to include inpaint head module to unet and merge the inpaint patch model layers to hf unet layers, by matching keys as @lawsonxwl also mentioned. And yes, it is quite overwhelming to navigate fooocus codebase..!
One thing to note down is that it is not exactly lora. It basically replaces original pretrained weight tensor (lets say w_orig) of unet for a given key (from a set of keys for which the weight needs to be updated) with the new weight tensor w_new. Now this w_new is calculated using three weight tensors w1, w_max, w_min. These three weights tensors you get from that inpaint patch model dict (fooocus_v26.inpaint.patch), where the key is the unet key (to be mapped to diffusers unet) and value is a tuple of those three tensors. So, w_new becomes w_old + (w1/255.0)*(w_max - w_min) + w_min. If w_old is of shape (320, 320, 3, 3), then w1 will be of same shape as w_old and w_max and w_min will both be of shapes (320, 1, 3, 3) which makes sense as I believe, it really is a shifting and scaling operation as done in above formula.
But the problem is when I tested using default params of sdxl inpaint pipeline with just inpaint head, I am getting something like this in the generated result (First is input image, 2nd is mask, 3rd is generated image using default sdxl inpaint pipeline without fooocus inpaint head, 4th is using fooocus inpaint head):
Also, if I use just the inpaint patch model, I am currently getting something like below:
Prompt used in both the cases for inpainting was "Young Female, Blue Eyes, Brown Long Hair"
I havent implemented any other change from fooocus yet.
@lawsonxwl any idea as to why this might be happening for both the cases? Especially when using the fooocus inpaint patch model. What could I possibly be missing?
@quark-toon I believe you forgot to disable passing extra inpaint_features to Unet after you've unloaded the Fooocus lora/patch. Also make sure you add the inpaint_features right after the conv_in
you can also message me in Telegram at bonlime if you want to debug this together