i think there is something wrong with new/latest scripts. RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x1536 and 768x3072)

Open gerylavin opened this issue 2 months ago • 2 comments

Describe the bug

i got "RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x1536 and 768x3072)" when i ran "train_dreambooth_lora_flux_advanced.py" from the latest version of diffusers or v0.35.1 but the problem solved when i downgraded the version to v0.31.0 including all the dependencies. i ran the scripts on modal (serverless gpu cloud). i used L40S 48GB and the same training parameters/arguments.

Reproduction

i think it's because i put "accelerate env" when the building image was in progress so the is no the description of the gpu below/

for the latest version of diffusers. i used this config:

Accelerate version: 1.10.0
Platform: Linux-4.4.0-x86_64-with-glibc2.39
accelerate bash location: /usr/local/bin/accelerate
Python version: 3.11.5
Numpy version: 2.3.4
PyTorch version: 2.8.0+cu129
PyTorch accelerator: N/A
System RAM: 167.58 GB
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: False
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: False
- tpu_use_cluster: False
- tpu_use_sudo: False

for the oldest version of diffusers. i used this config:

Accelerate version: 1.2.1
Platform: Linux-4.4.0-x86_64-with-glibc2.35
accelerate bash location: /usr/local/bin/accelerate
Python version: 3.11.5
Numpy version: 2.3.3
PyTorch version (GPU?): 2.5.1+cu124 (False)
PyTorch XPU available: False
PyTorch NPU available: False
PyTorch MLU available: False
PyTorch MUSA available: False
System RAM: 167.58 GB
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: bf16
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: False
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: False
- tpu_use_cluster: False
- tpu_use_sudo: False

Logs

the latest version of diffusers:
10/15/2025 16:56:04 - INFO - __main__ - ***** Running training *****
10/15/2025 16:56:04 - INFO - __main__ -   Num examples = 338
10/15/2025 16:56:04 - INFO - __main__ -   Num batches each epoch = 338
10/15/2025 16:56:04 - INFO - __main__ -   Num Epochs = 10
10/15/2025 16:56:04 - INFO - __main__ -   Instantaneous batch size per device = 1
10/15/2025 16:56:04 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
10/15/2025 16:56:04 - INFO - __main__ -   Gradient Accumulation steps = 1
10/15/2025 16:56:04 - INFO - __main__ -   Total optimization steps = 3380
Steps:   0%|          | 0/3380 [00:00<?, ?it/s]
[2025-10-15 16:56:05,548] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
df: /root/.triton/autotune: No such file or directory
[2025-10-15 16:56:09,437] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
Traceback (most recent call last):
  File "/root/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py", line 2470, in <module>
    main(args)
  File "/root/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_flux_advanced.py", line 2224, in main
    model_pred = transformer(
                 ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/accelerate/utils/operations.py", line 818, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/accelerate/utils/operations.py", line 806, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 696, in forward
    else self.time_text_embed(timestep, guidance, pooled_projections)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/diffusers/src/diffusers/models/embeddings.py", line 1614, in forward
    pooled_projections = self.text_embedder(pooled_projection)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/diffusers/src/diffusers/models/embeddings.py", line 2207, in forward
    hidden_states = self.linear_1(caption)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x1536 and 768x3072)
Steps:   0%|          | 0/3380 [00:07<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
    args.func(args)
  File "/usr/local/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1235, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.11/site-packages/accelerate/commands/launch.py", line 823, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/local/bin/python', 'train_dreambooth_lora_flux_advanced.py', '--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev', '--instance_data_dir=/root/r41s4', '--token_abstraction=r4is4', '--instance_prompt=a photo of a r4is4 woman', '--class_data_dir=/root/class_images', '--class_prompt=a photo of a woman', '--with_prior_preservation', '--prior_loss_weight=0.3', '--num_class_images=338', '--output_dir=/root/output_lora', '--lora_layers=attn.to_k,attn.to_q,attn.to_v,attn.to_out.0', '--mixed_precision=bf16', '--optimizer=prodigy', '--train_transformer_frac=1', '--train_text_encoder_ti', '--train_text_encoder_ti_frac=.25', '--weighting_scheme=none', '--resolution=1024', '--train_batch_size=1', '--guidance_scale=1', '--repeats=10', '--learning_rate=1.0', '--gradient_accumulation_steps=1', '--rank=16', '--num_train_epochs=10', '--checkpointing_steps=100', '--cache_latents', '--mixed_precision=bf16', '--gradient_checkpointing']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/pkg/modal/_runtime/container_io_manager.py", line 778, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 243, in run_input_sync
    res = io_context.call_finalized_function()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pkg/modal/_runtime/container_io_manager.py", line 197, in call_finalized_function
    res = self.finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/training_baru11_flux_full.py", line 159, in mulai_training
    subprocess.run(jalankan_training, cwd="/root/diffusers/examples/advanced_diffusion_training", check=True)
  File "/usr/local/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['accelerate', 'launch', '--config_file', '/root/accelerate_config.yaml', 'train_dreambooth_lora_flux_advanced.py', '--pretrained_model_name_or_path=black-forest-labs/FLUX.1-dev', '--instance_data_dir=/root/r41s4', '--token_abstraction=r4is4', '--instance_prompt=a photo of a r4is4 woman', '--class_data_dir=/root/class_images', '--class_prompt=a photo of a woman', '--with_prior_preservation', '--prior_loss_weight=0.3', '--num_class_images=338', '--output_dir=/root/output_lora', '--lora_layers=attn.to_k,attn.to_q,attn.to_v,attn.to_out.0', '--mixed_precision=bf16', '--optimizer=prodigy', '--train_transformer_frac=1', '--train_text_encoder_ti', '--train_text_encoder_ti_frac=.25', '--weighting_scheme=none', '--resolution=1024', '--train_batch_size=1', '--guidance_scale=1', '--repeats=10', '--learning_rate=1.0', '--gradient_accumulation_steps=1', '--rank=16', '--num_train_epochs=10', '--checkpointing_steps=100', '--cache_latents', '--mixed_precision=bf16', '--gradient_checkpointing']' returned non-zero exit status 1.


the oldest version of diffusers:
10/15/2025 17:00:37 - INFO - __main__ - ***** Running training *****
10/15/2025 17:00:37 - INFO - __main__ -   Num examples = 338
10/15/2025 17:00:37 - INFO - __main__ -   Num batches each epoch = 338
10/15/2025 17:00:37 - INFO - __main__ -   Num Epochs = 10
10/15/2025 17:00:37 - INFO - __main__ -   Instantaneous batch size per device = 1
10/15/2025 17:00:37 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
10/15/2025 17:00:37 - INFO - __main__ -   Gradient Accumulation steps = 1
10/15/2025 17:00:37 - INFO - __main__ -   Total optimization steps = 3380
Steps:   0%|          | 0/3380 [00:00<?, ?it/s]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 1/3380 [00:05<5:08:01,  5.47s/it, loss=0.582, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 2/3380 [00:10<4:53:58,  5.22s/it, loss=0.737, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 3/3380 [00:15<4:49:15,  5.14s/it, loss=0.694, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 4/3380 [00:20<4:47:07,  5.10s/it, loss=0.518, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 5/3380 [00:25<4:45:53,  5.08s/it, loss=0.536, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 6/3380 [00:30<4:45:17,  5.07s/it, loss=0.381, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 7/3380 [00:35<4:44:57,  5.07s/it, loss=0.692, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 8/3380 [00:40<4:44:49,  5.07s/it, loss=1.08, lr=1] Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 9/3380 [00:45<4:44:47,  5.07s/it, loss=0.59, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 10/3380 [00:50<4:44:35,  5.07s/it, loss=0.687, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 11/3380 [00:56<4:44:33,  5.07s/it, loss=0.7, lr=1]  Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 12/3380 [01:01<4:44:29,  5.07s/it, loss=0.747, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 13/3380 [01:06<4:44:25,  5.07s/it, loss=0.48, lr=1] Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 14/3380 [01:11<4:44:16,  5.07s/it, loss=0.448, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 15/3380 [01:16<4:44:13,  5.07s/it, loss=0.722, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   0%|          | 16/3380 [01:21<4:44:16,  5.07s/it, loss=0.578, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
Steps:   1%|          | 17/3380 [01:26<4:44:09,  5.07s/it, loss=0.683, lr=1]Passing `txt_ids` 3d torch.Tensor is deprecated.Please remove the batch dimension and pass it as a 2d torch Tensor
..............................................

System Info

since it's kinda complex (for me ) to run the cli on modal container. i provide these system infos using the images of each of the scripts:

for the latest version of diffusers, i used this image: image = (
modal.Image.from_registry( "nvidia/cuda:12.9.1-devel-ubuntu24.04", add_python="3.11" )

.apt_install("git")
.pip_install("uv==0.8.12","ninja<=1.13.0") #==0.5.5
.run_commands("git clone https://github.com/huggingface/diffusers.git /root/diffusers && cd /root/diffusers && uv pip install --system -e .")
.uv_pip_install("huggingface_hub[hf_transfer]==0.34.4", #0.1.8
                "accelerate>=0.31.0,<=1.10.0",
                "transformers>=4.41.2,<=4.55.2",
                "ftfy<=6.2.3",
                "tensorboard<=2.20.0",
                "Jinja2<=3.1.6",
                "peft>=0.11.1,<=0.17.0",
                "sentencepiece<=0.2.1",
                "wheel<=0.41.1",
                "wandb<=0.21.1",
                "bitsandbytes<=0.47.0",
                "datasets<=4.0.0",
                "pyarrow<=21.0.0",
                "prodigyopt<=1.1.2",
                "deepspeed<=0.17.4",
                "xformers<=0.0.32.post2",
                "triton<=3.4.0",
                "torch==2.8.0",
                "torchaudio==2.8.0",
                "torchvision==0.23.0",
                 extra_index_url="https://download.pytorch.org/whl/cu129"
                )
.uv_pip_install("https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.3.14/flash_attn-2.8.2+cu129torch2.8-cp311-cp311-linux_x86_64.whl")
.run_function(setup_accelerate, gpu="L40S")
.env({"PYTORCH_CUDA_ALLOC_CONF": "expandable_segments:True"})
.add_local_dir(DATASET_LOCAL_PATH, DATASET_DIR)
.add_local_dir(CLASS_LOCAL_PATH, CLASS_DIR)

    )

for the oldest version of diffusers, i used this image: image = (
modal.Image.from_registry( "nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04", add_python="3.11" )

.apt_install("git")
.pip_install("uv==0.5.5",)
.run_commands("git clone -b v0.31.0 https://github.com/huggingface/diffusers.git /root/diffusers && cd /root/diffusers && uv pip install --system -e .")
.uv_pip_install("huggingface_hub[hf_transfer]==0.26.0",
                "accelerate>=0.31.0,<=1.2.1",
                "transformers>=4.41.2,<=4.47.0",
                "ftfy==6.3.1",
                "tensorboard==2.18.0",
                "Jinja2==3.1.4",
                "peft>=0.11.1,<=0.14.0",
                "sentencepiece<=0.2.0",
                "wheel<=0.44.0",
                "bitsandbytes<=0.44.1",
                "datasets<=3.0.1",
                "pyarrow<=20.0.0",
                "prodigyopt<=1.0",
                "deepspeed<=0.15.3",
                "xformers<=0.0.28.post3",
                "triton<=3.1.0",
                "torch==2.5.1",
                "torchaudio==2.5.1",
                "torchvision==0.20.1",
                 extra_index_url="https://download.pytorch.org/whl/cu124"
                )
.uv_pip_install("flash-attn<=2.7.2.post1", extra_options="--no-build-isolation")
.run_function(setup_accelerate, gpu="L40S")
.env({"PYTORCH_CUDA_ALLOC_CONF": "expandable_segments:True"})
.add_local_dir(DATASET_LOCAL_PATH, DATASET_DIR)
.add_local_dir(CLASS_LOCAL_PATH, CLASS_DIR)

    )

Who can help?

@sayakpaul

Oct 15 '25 17:10 gerylavin

What's the training command you're using?

Oct 16 '25 03:10 sayakpaul

I'm also running into a shape-mismatch failure while finetuning Stable Diffusion XL for InstructPix2Pix. The error only appears after 836 training steps.

This is the training command is used:

accelerate launch /workspace/furniture-dsi/bin/train_sdxl.py \
  --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 \
  --train_data_dir datasets/15k_clean \
  --enable_xformers_memory_efficient_attention \
  --resolution 1024 \
  --train_batch_size 8 --dataloader_num_workers 0 \
  --gradient_accumulation_steps 2 \
  --gradient_checkpointing \
  --max_train_steps 20000 \
  --checkpointing_steps 10000 \
  --checkpoints_total_limit 1 \
  --learning_rate 5e-05 \
  --max_grad_norm 1 \
  --lr_warmup_steps 0 \
  --conditioning_dropout_prob 0.1 \
  --mixed_precision fp16 \
  --vae_precision fp32 \
  --seed 42 \
  --original_image_column input_image \
  --edit_prompt_column edit_prompt \
  --edited_image_column edited_image \
  --binary_mask_column mask_image \
  --output_dir furniture-dsi/checkpoints/ip2p_sdxl \
  --report_to wandb \
  --resume_from_checkpoint latest \
  --run_name ip2p_sdxl_1

Error log:

2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ - ***** Running training *****
2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ -   Num examples = 13386
2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ -   Num Epochs = 24
2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ -   Instantaneous batch size per device = 8
2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 16
2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ -   Gradient Accumulation steps = 2
2025-12-01 16:36:31 12/01/2025 16:36:31 - INFO - __main__ -   Total optimization steps = 20000
2025-12-01 16:36:31 Checkpoint 'latest' does not exist. Starting a new training run.
2025-12-01 16:36:31 Steps:   4%|████▋                                                                                                          | 836/20000 [1:12:39<27:35:58,  5.18s/it, lr=5e-5, step_loss=0.109]Traceback (most recent call last):
2025-12-01 17:49:10   File "/workspace/furniture-dsi/bin/train_sdxl.py", line 1438, in <module>
2025-12-01 17:49:10     main()
2025-12-01 17:49:10   File "/workspace/furniture-dsi/bin/train_sdxl.py", line 1294, in main
2025-12-01 17:49:10     model_pred = unet(
2025-12-01 17:49:10                  ^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2025-12-01 17:49:10     return self._call_impl(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2025-12-01 17:49:10     return forward_call(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/accelerate/utils/operations.py", line 819, in forward
2025-12-01 17:49:10     return model_forward(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/accelerate/utils/operations.py", line 807, in __call__
2025-12-01 17:49:10     return convert_to_fp32(self.model_forward(*args, **kwargs))
2025-12-01 17:49:10                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
2025-12-01 17:49:10     return func(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/workspace/diffusers/src/diffusers/models/unets/unet_2d_condition.py", line 1147, in forward
2025-12-01 17:49:10     aug_emb = self.get_aug_embed(
2025-12-01 17:49:10               ^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/workspace/diffusers/src/diffusers/models/unets/unet_2d_condition.py", line 979, in get_aug_embed
2025-12-01 17:49:10     aug_emb = self.add_embedding(add_embeds)
2025-12-01 17:49:10               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2025-12-01 17:49:10     return self._call_impl(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2025-12-01 17:49:10     return forward_call(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/workspace/diffusers/src/diffusers/models/embeddings.py", line 1298, in forward
2025-12-01 17:49:10     sample = self.linear_1(sample)
2025-12-01 17:49:10              ^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
2025-12-01 17:49:10     return self._call_impl(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
2025-12-01 17:49:10     return forward_call(*args, **kwargs)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10   File "/venv/main/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 134, in forward
2025-12-01 17:49:10     return F.linear(input, self.weight, self.bias)
2025-12-01 17:49:10            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-12-01 17:49:10 RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x7424 and 2816x1280)

Dec 02 '25 07:12 SushantReplaci