diffusers [Dreambooth] number of channels error in train

Describe the bug

I am trying to use train_dreambooth.py to train a personalized model by following https://github.com/huggingface/diffusers/tree/main/examples/dreambooth. I got the following error:

  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 458, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead

A bit of context:

The error is related to the mismatch of number of channels between con2d weights and input. The input has 3 channels and conv2d expects 4. However, when I run train_dreambooth_lora.py with the same input, no such mismatch occurred. In fact, the same set of images are used in dreambooth_sdxl_lora without problems.

Reproduction

I ran this powershell script

$env:MODEL_NAME = "jzli/majicMIX-realistic-7"
$env:INSTANCE_DIR = "dog"
$env:OUTPUT_DIR = "dreambooth-majicMIX-dog"

& accelerate launch train_dreambooth.py `
  --pretrained_model_name_or_path $env:MODEL_NAME `
  --instance_data_dir $env:INSTANCE_DIR `
  --output_dir $env:OUTPUT_DIR `
  --mixed_precision "fp16" `
  --instance_prompt "a photo of a [V] dog" `
  --class_prompt "a photo of a dog" `
  --resolution 512 `
  --train_batch_size 1 `
  --gradient_accumulation_steps 2 `
  --learning_rate 5e-6 `
  --report_to "wandb" `
  --lr_scheduler "constant" `
  --gradient_checkpointing `
  --use_8bit_adam `
  --train_text_encoder `
  --lr_warmup_steps 0 `
  --max_train_steps 800 `
  --checkpointing_steps 50 `
  --validation_prompt "A photo of a [V] dog in a bucket" `
  --seed "0"

Logs

C:\DL\diffusers\examples\dreambooth\train_dreambooth.py:602: UserWarning: You need not use --class_prompt without --with_prior_preservation.
  warnings.warn("You need not use --class_prompt without --with_prior_preservation.")
04/09/2024 13:43:45 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'variance_type', 'rescale_betas_zero_snr', 'thresholding', 'clip_sample_range', 'sample_max_value'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.16.5
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
04/09/2024 13:44:02 - INFO - __main__ - ***** Running training *****
04/09/2024 13:44:02 - INFO - __main__ -   Num examples = 5
04/09/2024 13:44:02 - INFO - __main__ -   Num batches each epoch = 5
04/09/2024 13:44:02 - INFO - __main__ -   Num Epochs = 267
04/09/2024 13:44:02 - INFO - __main__ -   Instantaneous batch size per device = 1
04/09/2024 13:44:02 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 2
04/09/2024 13:44:02 - INFO - __main__ -   Gradient Accumulation steps = 2
04/09/2024 13:44:02 - INFO - __main__ -   Total optimization steps = 800
Steps:   0%|                                                                                                      | 0/800 [00:00<?, ?it/s]img dim:  torch.Size([1, 3, 512, 512])
Traceback (most recent call last):
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1440, in <module>
    main(args)
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1270, in main
    model_pred = unet(
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "C:\DL\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1169, in forward
    sample = self.conv_in(sample)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 462, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 458, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
wandb: You can sync this run to the cloud by running:
wandb: wandb sync C:\DL\diffusers\examples\dreambooth\wandb\offline-run-20240409_134401-ligzdtih
wandb: Find logs at: .\wandb\offline-run-20240409_134401-ligzdtih\logs
Traceback (most recent call last):
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\WPy64-31090\python-3.10.9.amd64\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\commands\accelerate_cli.py", line 46, in main
    args.func(args)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\commands\launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\commands\launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\WPy64-31090\\python-3.10.9.amd64\\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path', 'jzli/majicMIX-realistic-7', '--instance_data_dir', 'dog', '--output_dir', 'dreambooth-majicMIX-dog', '--mixed_precision', 'fp16', '--instance_prompt', 'a photo of a [V] dog', '--class_prompt', 'a photo of a dog', '--resolution', '512', '--train_batch_size', '1', '--gradient_accumulation_steps', '2', '--learning_rate', '5e-6', '--report_to', 'wandb', '--lr_scheduler', 'constant', '--gradient_checkpointing', '--use_8bit_adam', '--train_text_encoder', '--lr_warmup_steps', '0', '--max_train_steps', '800', '--checkpointing_steps', '50', '--validation_prompt', 'A photo of a [V] dog in a bucket', '--seed', '0']' returned non-zero exit status 1.

System Info

diffusers version: 0.28.0.dev0
Platform: Windows-10-10.0.22631-SP0
Python version: 3.10.9
PyTorch version (GPU?): 2.2.2+cu121 (True)
Huggingface_hub version: 0.22.2
Transformers version: 4.39.2
Accelerate version: 0.28.0
xFormers version: 0.0.25.post1
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

@sayakpaul

Apr 09 '24 17:04 bssrdf

I am unable to reproduce. Could you look at this Colab notebook? Are you sure you're running train_dreambooth.py and all other train_xxx.py in the exact same environment?

Apr 09 '24 19:04 tolgacangoz

@standardAI, thanks for the reproducing effort. I just pulled the latest diffusers main and got the same error.

Yes, I did run all train_xxx.py in the same env and all others are working fine.

Apr 09 '24 21:04 bssrdf

From the error it looks like the underlying UNet isn't a compatible one. For this script to work properly, we need to have the UNet in a compatible structure. For example, it should have 4 channels and NOT 3 as the one reflected here.

Apr 10 '24 02:04 sayakpaul

From the error it looks like the underlying UNet isn't a compatible one. For this script to work properly, we need to have the UNet in a compatible structure. For example, it should have 4 channels and NOT 3 as the one reflected here.

@sayakpaul, I don't think UNET is the problem after loading the weight from model "jzli/majicMIX-realistic-7". Both in_channel and out_channel are 4. The input images have 3 channels which is also correct because they are in sRGB format. The problem is the connection of these two via conv2D as done by train_dreambooth.py. However, train_dreambooth_lora.py works fine and I don't see a conv2D op on a channel 3 and 4 tensor. It's really odd.

Apr 10 '24 03:04 bssrdf

I see. Would it be possible for you to do the following?

Load the UNet from your checkpoint.
Run a single forward pass on a single training image from your dataset.

This way we should be able to localize the problem and compare this with train_dreambooth_lora.py.

Apr 10 '24 03:04 sayakpaul

I just noticed:

  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1440, in <module>
    main(args)
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1270, in main
    model_pred = unet(

This implies that there are 3 missing lines in your train_dreambooth.py file, where main(args)'s line should have been 1443. Could you tell us what those 3 lines are, or if they are irrelevant?

Apr 10 '24 10:04 tolgacangoz

@standardAI, thanks for the reproducing effort. I just pulled the latest diffusers main and got the same error.

Yes, I did run all train_xxx.py in the same env and all others are working fine.

I just noticed:
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1440, in <module>
    main(args)
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1270, in main
    model_pred = unet(
This implies that there are 3 missing lines in your train_dreambooth.py file, where main(args)'s line should have been 1443. Could you tell us what those 3 lines are, or if they are irrelevant?

Sure. they are like this

# Predict the noise residual
   model_pred = unet(
       noisy_model_input, timesteps, encoder_hidden_states, class_labels=class_labels, return_dict=False
     )[0]

Apr 10 '24 11:04 bssrdf

I see. Would it be possible for you to do the following?

Load the UNet from your checkpoint.

Run a single forward pass on a single training image from your dataset.

This way we should be able to localize the problem and compare this with train_dreambooth_lora.py.

Thanks for the tip, @sayakpaul. I'll try to debug this way.

Apr 10 '24 11:04 bssrdf

I mean train_dreambooth.py should have 1443 lines of code. But I think yours has 1440 lines of code. I question this difference. Could you confirm this?

Apr 10 '24 11:04 tolgacangoz

Sorry, I mis-read. Yes, train_dreambooth.py has 1443 lines. Now the log looks like

04/10/2024 07:48:15 - INFO - __main__ -   Total optimization steps = 800
Steps:   0%|                                                                                                      | 0/800 [00:00<?, ?it/s]unet_forward: sample size is  torch.Size([1, 3, 512, 512])
Traceback (most recent call last):
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1443, in <module>
    main(args)
  File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1273, in main
    model_pred = unet(
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "C:\DL\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1171, in forward
    sample = self.conv_in(sample)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 462, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 458, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
Steps:   0%|                                                                                                      | 0/800 [00:01<?, ?it/s]
Traceback (most recent call last):

I just updated diffusers to main.

Apr 10 '24 11:04 bssrdf

Now I see a problem https://github.com/huggingface/diffusers/blob/37e9d695af0ed1692b8c82fd2af32cb4c0854f81/examples/dreambooth/train_dreambooth.py#L1228-L1233

When I run train_dreambooth.py, if vae is not None: is False such that model_input becomes pixel_values. This can not be right.

However, when I run train_dreambooth_lora.py, if vae is not None: is True. model_input comes out with [1,4,64,64] which has 4 channels.

So the problem is somehow vae encode step is skipped in ```train_dreambooth.py".

EDIT: Yes, the problem is actually this https://github.com/huggingface/diffusers/blob/37e9d695af0ed1692b8c82fd2af32cb4c0854f81/examples/dreambooth/train_dreambooth.py#L936-L941

For some reason model_has_vae(args) returns False so vae is not loaded from the model file.

But in train_dreambooth_lora.py, this part is done differently https://github.com/huggingface/diffusers/blob/37e9d695af0ed1692b8c82fd2af32cb4c0854f81/examples/dreambooth/train_dreambooth_lora.py#L863-L870

train_dreambooth.py should follow train_dreambooth_lora.py in loading vae and the problem is fixed.

Apr 10 '24 12:04 bssrdf

Can you check if the VAE is getting properly loaded?

Apr 10 '24 12:04 sayakpaul

Can you check if the VAE is getting properly loaded?

No, it is not. See my updated post above

Apr 10 '24 12:04 bssrdf

#3454

Apr 10 '24 12:04 tolgacangoz

#3454

Hah, they already fixed this issue, but only for train_dreambooth_lora.py. I'll submit a PR fixing train_dreambooth.py. @sayakpaul , @standardAI , thank you for your help.

Apr 10 '24 12:04 bssrdf

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 10 '24 15:05 github-actions[bot]

The fix is ready but still not getting merged.

May 10 '24 15:05 bssrdf

diffusers
diffusers copied to clipboard

[Dreambooth] number of channels error in train_dreambooth.py

Describe the bug

A bit of context:

Reproduction

Logs

System Info

Who can help?

diffusers diffusers copied to clipboard

[Dreambooth] number of channels error in train_dreambooth.py

Describe the bug

A bit of context:

Reproduction

Logs

System Info

Who can help?

diffusers
diffusers copied to clipboard