diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Adding additional input channels to model after intialization

Open BenjaminIrwin opened this issue 1 year ago • 2 comments

Have scoured the docs for an answer to this, to no avail. Is it possible to add additional input channels to a model after initializing it using .from_pretrained.

For example (taken from your Dreambooth example):

    unet = UNet2DConditionModel.from_pretrained(
        args.pretrained_model_name_or_path,
        subfolder="unet",
        revision=args.revision,
    )

In the code above, if I now wanted to introduce additional input channels to unet and zero-initialize the weights, would this be possible? If so, how would I do this?

Thank you in advance.

BenjaminIrwin avatar Dec 08 '22 22:12 BenjaminIrwin

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

patrickvonplaten avatar Dec 11 '22 16:12 patrickvonplaten

Thanks very much. This is great.

BenjaminIrwin avatar Dec 14 '22 15:12 BenjaminIrwin

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 08 '23 15:01 github-actions[bot]

For future readers:

The code snippet above can be used to transform a text2image unet to an inpainting unet as asked here: https://github.com/huggingface/diffusers/issues/2280

patrickvonplaten avatar Feb 08 '23 16:02 patrickvonplaten

@patrickvonplaten this is exactly the issue i was looking for!

So I forked a popular hugginface space to create a custom dreambooth model trainging it against a couple of new concepts: https://huggingface.co/spaces/multimodalart/dreambooth-training

It's great! I've used it a few times and generated a few v1.5 based custom models!

I thought I could use a custrom trained model based on SDv1.5 and it would work with the inpainting pipeline out of the box...oh how wrong i was :)

I've tried to change my fork to add SDv1.5-inpainting as the base model but no luck debugging the workspace.

And then I saw this issue which if I'm not mistkane...should allow me to use my regular SDv1.5 model and use it for inpainting pipeline? Am I mistaken here?

I used your suggestion

model_path = "mycustommodel"

unet = UNet2DConditionModel.from_pretrained(model_path, torch_dtype=torch.float16, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    model_path, scheduler=scheduler, torch_dtype=torch.float16, unet=unet, safety_checker=None,
)


# For reference what i'm doing to setup inference...some m1 macbook specific stuff here

pipe = pipe.to("mps")

g_cuda = None
# @markdown Can set random seed here for reproducibility.
gen = torch.Generator(device="cpu")
seed = 52362  # @param {type:"number"}
gen.manual_seed(seed)

negative_prompt = ""
num_samples = 1
guidance_scale = 7.5
num_inference_steps = 25
height = 512
width = 512

images = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    generator=gen,
    height=height,
    width=width,
    negative_prompt=negative_prompt,
    num_images_per_prompt=num_samples,
    num_inference_steps=num_inference_steps,
    guidance_scale=guidance_scale,
).images

So i do that and finally i'm not getting the same unet error i was getting prior to your suggestion. However the inference isn't quite working, with the generated image looking just like some noise. output

@patrickvonplaten Am i completely on the wrong path here? Is my only real option to train a new custom model with SDv1.5-inpaiting as the base model?

Thanks in advance!

hamin avatar Feb 15 '23 17:02 hamin

You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different

patrickvonplaten avatar Feb 16 '23 13:02 patrickvonplaten

@patrickvonplaten not sure how to do that...interestingly enough i ended up using the popular sd gui https://github.com/AUTOMATIC1111/stable-diffusion-webui

It has a nice UI to merge models. So I merged v1-5 inpainting, my custom model and v1-5-pruned.

Found out about it here https://www.reddit.com/r/StableDiffusion/comments/zyi24j/how_to_turn_any_model_into_an_inpainting_model/

That worked for me. @patrickvonplaten is your code approach essentially doing the same? Still learning a lot about diffusion models so apologies. And of course thank you for all of your work!

hamin avatar Feb 16 '23 18:02 hamin

@hamin @patrickvonplaten I've been trying to do the same stuff as @hamin. Still, after looking at these issues, the approach for converting any model to an inpainting model through the python scripts is always a better thing to do so, @patrickvonplaten, as you mentioned "You need to fine-tune the text-to-image to learn how to do in-painting. The architecture is slightly different", can you say how to do so or any script is available in internet to do so.

I hope you'll revert to me soon, Regards, SS

satwiksunnam19 avatar Feb 27 '23 07:02 satwiksunnam19

Opened a PR to improve error handling for the above case btw: https://github.com/huggingface/diffusers/pull/2847

patrickvonplaten avatar Mar 27 '23 18:03 patrickvonplaten

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

pcuenca avatar Mar 28 '23 07:03 pcuenca

@patrickvonplaten Was looking for this answer in the documentation. Would be great to have this more prominent in the main doc.

rob-hen avatar Apr 18 '23 14:04 rob-hen

RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[4, 4, 64, 64] to have 9 channels, but got 4 channels instead

manxiaoyu avatar Jul 28 '23 08:07 manxiaoyu

@patrickvonplaten

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:


Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
 - conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

As you said, they are randomly initialized, meaning the learned kernels in the conv_in layer are gone. Should it not be possible to take the kernels of the pre-trained model and zero-initialize the other ones? Like this e.g.

#load this from a file
conv_in_weights_pretrained = torch.load("conv_in_weights_pretrained.pt") #shape[320,4,3,3]
#set first 4 channels of unet conv_in to pretrained
unet.conv_in.weight.data[:, :4, :, :] = conv_in_weights_pretrained.data
#zero initialize the remaining channels
unet.conv_in.weight.data[:, 4:, :, :] = 0

Thanks

TimAlexander avatar Jan 06 '24 15:01 TimAlexander

Hey @BenjaminIrwin, This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint. Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

Hi All,

I am getting following error - Can someone help please?

File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta raise ValueError( ValueError: Cannot load C:\Users\xxx.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\f2a84281d6f8db3c757195dd0c9a38dbdea90bb4\decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

AIPopcorn avatar Feb 17 '24 18:02 AIPopcorn

Hey @BenjaminIrwin, This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint. Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

This is great. Maybe this snippet could have a place in the documentation somewhere /cc @yiyixuxu what do you think?

Hi All,

I am getting following error - Can someone help please?

File "C:\SDComfyUI\ComfyUI_windows_portable\src\diffusers\src\diffusers\models\modeling_utils.py", line 154, in load_model_dict_into_meta raise ValueError( ValueError: Cannot load C:\Users\xxx.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\f2a84281d6f8db3c757195dd0c9a38dbdea90bb4\decoder because embedding.1.weight expected shape tensor(..., device='meta', size=(320, 64, 1, 1)), but got torch.Size([320, 16, 1, 1]). If you want to instead overwrite randomly initialized weights, please make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. For more information, see also: #1619 (comment) as an example.

I got the same error

peki12345 avatar Feb 18 '24 03:02 peki12345

low_cpu_mem_usage=False and ignore_mismatched_sizes=True

Sorry, but how do I pass them on?

Hey @BenjaminIrwin,

This is actually quite easily doable. You just need to pass a config parameter that will change the size of your input channels to the required size. E.g. Let's say you want to fine-tune SD 1.4 to do inpainting. All you need to do then is to run the following code:

from diffusers import UNet2DConditionModel

model_id = "CompVis/stable-diffusion-v1-4"
unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet", in_channels=9, low_cpu_mem_usage=False, ignore_mismatched_sizes=True)

It will initialize the models with the pretrained weights except for the input conv weight which now is of size:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at CompVis/stable-diffusion-v1-4 and are newly initialized because the shapes did not match:
- conv_in.weight: found shape torch.Size([320, 4, 3, 3]) in the checkpoint and torch.Size([320, 9, 3, 3]) in the model instantiated

and is thus randomly initialized. The other weights are transferred from the pretrained checkpoint.

Make sure to pass both low_cpu_mem_usage=False and ignore_mismatched_sizes=True. First you cannot make use of the super fast low cpu memory functionality as it doesn't check weights for mismatches, so make sure to disable it. 2nd if you don't pass ignore_mismatched_sizes=True an error will be thrown.

Where do I have to run this code and how to pass on tows two parameters?

Sturmkater avatar Feb 23 '24 16:02 Sturmkater