diffusers
diffusers copied to clipboard
[Dreambooth] number of channels error in train_dreambooth.py
Describe the bug
I am trying to use train_dreambooth.py to train a personalized model by following https://github.com/huggingface/diffusers/tree/main/examples/dreambooth. I got the following error:
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 458, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
A bit of context:
The error is related to the mismatch of number of channels between con2d weights and input. The input has 3 channels and conv2d expects 4. However, when I run train_dreambooth_lora.py with the same input, no such mismatch occurred. In fact, the same set of images are used in dreambooth_sdxl_lora without problems.
Reproduction
I ran this powershell script
$env:MODEL_NAME = "jzli/majicMIX-realistic-7"
$env:INSTANCE_DIR = "dog"
$env:OUTPUT_DIR = "dreambooth-majicMIX-dog"
& accelerate launch train_dreambooth.py `
--pretrained_model_name_or_path $env:MODEL_NAME `
--instance_data_dir $env:INSTANCE_DIR `
--output_dir $env:OUTPUT_DIR `
--mixed_precision "fp16" `
--instance_prompt "a photo of a [V] dog" `
--class_prompt "a photo of a dog" `
--resolution 512 `
--train_batch_size 1 `
--gradient_accumulation_steps 2 `
--learning_rate 5e-6 `
--report_to "wandb" `
--lr_scheduler "constant" `
--gradient_checkpointing `
--use_8bit_adam `
--train_text_encoder `
--lr_warmup_steps 0 `
--max_train_steps 800 `
--checkpointing_steps 50 `
--validation_prompt "A photo of a [V] dog in a bucket" `
--seed "0"
Logs
C:\DL\diffusers\examples\dreambooth\train_dreambooth.py:602: UserWarning: You need not use --class_prompt without --with_prior_preservation.
warnings.warn("You need not use --class_prompt without --with_prior_preservation.")
04/09/2024 13:43:45 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'variance_type', 'rescale_betas_zero_snr', 'thresholding', 'clip_sample_range', 'sample_max_value'} was not found in config. Values will be initialized to default values.
{'reverse_transformer_layers_per_block'} was not found in config. Values will be initialized to default values.
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.16.5
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
04/09/2024 13:44:02 - INFO - __main__ - ***** Running training *****
04/09/2024 13:44:02 - INFO - __main__ - Num examples = 5
04/09/2024 13:44:02 - INFO - __main__ - Num batches each epoch = 5
04/09/2024 13:44:02 - INFO - __main__ - Num Epochs = 267
04/09/2024 13:44:02 - INFO - __main__ - Instantaneous batch size per device = 1
04/09/2024 13:44:02 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 2
04/09/2024 13:44:02 - INFO - __main__ - Gradient Accumulation steps = 2
04/09/2024 13:44:02 - INFO - __main__ - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/s]img dim: torch.Size([1, 3, 512, 512])
Traceback (most recent call last):
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1440, in <module>
main(args)
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1270, in main
model_pred = unet(
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "C:\DL\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1169, in forward
sample = self.conv_in(sample)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 462, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 458, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
wandb: You can sync this run to the cloud by running:
wandb: wandb sync C:\DL\diffusers\examples\dreambooth\wandb\offline-run-20240409_134401-ligzdtih
wandb: Find logs at: .\wandb\offline-run-20240409_134401-ligzdtih\logs
Traceback (most recent call last):
File "C:\WPy64-31090\python-3.10.9.amd64\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\WPy64-31090\python-3.10.9.amd64\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\WPy64-31090\python-3.10.9.amd64\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\commands\accelerate_cli.py", line 46, in main
args.func(args)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\commands\launch.py", line 1057, in launch_command
simple_launcher(args)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\commands\launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\WPy64-31090\\python-3.10.9.amd64\\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path', 'jzli/majicMIX-realistic-7', '--instance_data_dir', 'dog', '--output_dir', 'dreambooth-majicMIX-dog', '--mixed_precision', 'fp16', '--instance_prompt', 'a photo of a [V] dog', '--class_prompt', 'a photo of a dog', '--resolution', '512', '--train_batch_size', '1', '--gradient_accumulation_steps', '2', '--learning_rate', '5e-6', '--report_to', 'wandb', '--lr_scheduler', 'constant', '--gradient_checkpointing', '--use_8bit_adam', '--train_text_encoder', '--lr_warmup_steps', '0', '--max_train_steps', '800', '--checkpointing_steps', '50', '--validation_prompt', 'A photo of a [V] dog in a bucket', '--seed', '0']' returned non-zero exit status 1.
System Info
-
diffusers
version: 0.28.0.dev0 - Platform: Windows-10-10.0.22631-SP0
- Python version: 3.10.9
- PyTorch version (GPU?): 2.2.2+cu121 (True)
- Huggingface_hub version: 0.22.2
- Transformers version: 4.39.2
- Accelerate version: 0.28.0
- xFormers version: 0.0.25.post1
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
@sayakpaul
I am unable to reproduce. Could you look at this Colab notebook? Are you sure you're running train_dreambooth.py
and all other train_xxx.py
in the exact same environment?
@standardAI, thanks for the reproducing effort. I just pulled the latest diffusers
main and got the same error.
Yes, I did run all train_xxx.py
in the same env and all others are working fine.
From the error it looks like the underlying UNet isn't a compatible one. For this script to work properly, we need to have the UNet in a compatible structure. For example, it should have 4 channels and NOT 3 as the one reflected here.
From the error it looks like the underlying UNet isn't a compatible one. For this script to work properly, we need to have the UNet in a compatible structure. For example, it should have 4 channels and NOT 3 as the one reflected here.
@sayakpaul, I don't think UNET is the problem after loading the weight from model "jzli/majicMIX-realistic-7". Both in_channel and out_channel are 4. The input images have 3 channels which is also correct because they are in sRGB format. The problem is the connection of these two via conv2D as done by train_dreambooth.py
. However, train_dreambooth_lora.py
works fine and I don't see a conv2D op on a channel 3 and 4 tensor. It's really odd.
I see. Would it be possible for you to do the following?
- Load the UNet from your checkpoint.
- Run a single forward pass on a single training image from your dataset.
This way we should be able to localize the problem and compare this with train_dreambooth_lora.py
.
I just noticed:
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1440, in <module>
main(args)
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1270, in main
model_pred = unet(
This implies that there are 3 missing lines in your train_dreambooth.py
file, where main(args)
's line should have been 1443. Could you tell us what those 3 lines are, or if they are irrelevant?
@standardAI, thanks for the reproducing effort. I just pulled the latest
diffusers
main and got the same error.
Yes, I did run all
train_xxx.py
in the same env and all others are working fine.
I just noticed:
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1440, in <module> main(args) File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1270, in main model_pred = unet(
This implies that there are 3 missing lines in your
train_dreambooth.py
file, wheremain(args)
's line should have been 1443. Could you tell us what those 3 lines are, or if they are irrelevant?
Sure. they are like this
# Predict the noise residual
model_pred = unet(
noisy_model_input, timesteps, encoder_hidden_states, class_labels=class_labels, return_dict=False
)[0]
I see. Would it be possible for you to do the following?
- Load the UNet from your checkpoint.
- Run a single forward pass on a single training image from your dataset.
This way we should be able to localize the problem and compare this with
train_dreambooth_lora.py
.
Thanks for the tip, @sayakpaul. I'll try to debug this way.
I mean train_dreambooth.py
should have 1443 lines of code. But I think yours has 1440 lines of code. I question this difference. Could you confirm this?
Sorry, I mis-read. Yes, train_dreambooth.py has 1443 lines. Now the log looks like
04/10/2024 07:48:15 - INFO - __main__ - Total optimization steps = 800
Steps: 0%| | 0/800 [00:00<?, ?it/s]unet_forward: sample size is torch.Size([1, 3, 512, 512])
Traceback (most recent call last):
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1443, in <module>
main(args)
File "C:\DL\diffusers\examples\dreambooth\train_dreambooth.py", line 1273, in main
model_pred = unet(
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\accelerate\utils\operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "C:\DL\diffusers\src\diffusers\models\unets\unet_2d_condition.py", line 1171, in forward
sample = self.conv_in(sample)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 462, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\conv.py", line 458, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
Steps: 0%| | 0/800 [00:01<?, ?it/s]
Traceback (most recent call last):
I just updated diffusers to main.
Now I see a problem https://github.com/huggingface/diffusers/blob/37e9d695af0ed1692b8c82fd2af32cb4c0854f81/examples/dreambooth/train_dreambooth.py#L1228-L1233
When I run train_dreambooth.py
, if vae is not None:
is False such that model_input
becomes pixel_values
. This can not be right.
However, when I run train_dreambooth_lora.py
, if vae is not None:
is True. model_input
comes out with [1,4,64,64] which has 4 channels.
So the problem is somehow vae encode step is skipped in ```train_dreambooth.py".
EDIT: Yes, the problem is actually this https://github.com/huggingface/diffusers/blob/37e9d695af0ed1692b8c82fd2af32cb4c0854f81/examples/dreambooth/train_dreambooth.py#L936-L941
For some reason model_has_vae(args)
returns False so vae is not loaded from the model file.
But in train_dreambooth_lora.py
, this part is done differently
https://github.com/huggingface/diffusers/blob/37e9d695af0ed1692b8c82fd2af32cb4c0854f81/examples/dreambooth/train_dreambooth_lora.py#L863-L870
train_dreambooth.py
should follow train_dreambooth_lora.py
in loading vae and the problem is fixed.
Can you check if the VAE is getting properly loaded?
Can you check if the VAE is getting properly loaded?
No, it is not. See my updated post above
#3454
#3454
Hah, they already fixed this issue, but only for train_dreambooth_lora.py
. I'll submit a PR fixing train_dreambooth.py
.
@sayakpaul , @standardAI , thank you for your help.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
The fix is ready but still not getting merged.