ai-toolkit Hidream training error: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

Bucket sizes for /root/notebooks/viddata/st/images: 1856x928: 94 files 1 buckets made Generating baseline samples before training Generating Images: 0%| | 0/10 [00:00<?, ?it/s]torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to eager attention. This warning can be removed using the argument attn_implementation="eager" when loading the model. hidream_stranger: 0%| | 0/2000 [00:00<?, ?it/s]Traceback (most recent call last): File "/root/tools/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2027, in run loss_dict = self.hook_train_loop(batch_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1515, in hook_train_loop loss = self.train_single_accumulation(batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1435, in train_single_accumulation noise_pred = self.predict_noise( ^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 768, in predict_noise return self.sd.predict_noise( ^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/toolkit/models/base_model.py", line 832, in predict_noise noise_pred = self.get_noise_prediction( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/hidream_model.py", line 371, in get_noise_prediction noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/src/models/transformers/transformer_hidream_image.py", line 429, in forward ids = torch.cat((img_ids, txt_ids), dim=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list. Batch Items:

/root/notebooks/viddata/st/images/0082.png
/root/notebooks/viddata/st/images/0085.png Error running job: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.

======================================== Result:

0 completed jobs
1 failure ======================================== Traceback (most recent call last): File "/root/tools/ai-toolkit/run.py", line 119, in main() File "/root/tools/ai-toolkit/run.py", line 107, in main raise e File "/root/tools/ai-toolkit/run.py", line 95, in main job.run() File "/root/tools/ai-toolkit/jobs/ExtensionJob.py", line 22, in run process.run() File "/root/tools/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2035, in run raise e File "/root/tools/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2027, in run loss_dict = self.hook_train_loop(batch_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1515, in hook_train_loop loss = self.train_single_accumulation(batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1435, in train_single_accumulation noise_pred = self.predict_noise( ^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 768, in predict_noise return self.sd.predict_noise( ^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/toolkit/models/base_model.py", line 832, in predict_noise noise_pred = self.get_noise_prediction( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/hidream_model.py", line 371, in get_noise_prediction noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/src/models/transformers/transformer_hidream_image.py", line 429, in forward ids = torch.cat((img_ids, txt_ids), dim=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list. hidream_stranger: 0%| | 0/2000 [00:01<?, ?it/s]

Apr 18 '25 21:04 oliverban

EDIT: This error comes from using a batch higher than 1. I have 80GB VRAM so wanted it to be slightly faster. EDIT2: This also happens when trying to input other resolutions above 1024, such as 1280 or 1536.

Apr 18 '25 22:04 oliverban

I have the same issue with any size image. This one is from 1536, 24gig Vram

RuntimeError: The expanded size of the tensor (4096) must match the existing size (9072) at non-singleton dimension 0. Target sizes: [4096, 3]. Tensor sizes: [9072, 3]

May 12 '25 12:05 Morphious12345

The primary error "RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list." during torch.cat((img_ids, txt_ids), dim=1) indicates a batch size mismatch between img_ids and txt_ids. Specifically, txt_ids (tensor number 1) has a batch size of 2, while img_ids (tensor number 0) has a batch size of 1. This occurs when training with a batch size greater than 1, especially with non-square images, due to how img_ids was being created with a hardcoded batch size of 1 in HidreamModel.get_noise_prediction.

The secondary error, "RuntimeError: The expanded size of the tensor (4096) must match the existing size (9072) at non-singleton dimension 0", encountered with higher resolutions, points to a sequence length mismatch, likely because the number of image patches generated (pH * pW) exceeds the self.model.max_seq that the model's positional embeddings or padding logic expects.

May 16 '25 13:05 D-Ogi

D-Ogi - Great explanation, but how to fix the errors?

May 17 '25 12:05 Morphious12345

@Morphious12345, I'll try to make a PR with the dedicated fix. Unfortunately I can't promise any ETA.

May 17 '25 12:05 D-Ogi