Hidream training error: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.
Bucket sizes for /root/notebooks/viddata/st/images:
1856x928: 94 files
1 buckets made
Generating baseline samples before training
Generating Images: 0%| | 0/10 [00:00<?, ?it/s]torch.nn.functional.scaled_dot_product_attention does not support output_attentions=True. Falling back to eager attention. This warning can be removed using the argument attn_implementation="eager" when loading the model.
hidream_stranger: 0%| | 0/2000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/root/tools/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2027, in run
loss_dict = self.hook_train_loop(batch_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1515, in hook_train_loop
loss = self.train_single_accumulation(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1435, in train_single_accumulation
noise_pred = self.predict_noise(
^^^^^^^^^^^^^^^^^^^
File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 768, in predict_noise
return self.sd.predict_noise(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/tools/ai-toolkit/toolkit/models/base_model.py", line 832, in predict_noise
noise_pred = self.get_noise_prediction(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/hidream_model.py", line 371, in get_noise_prediction
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/src/models/transformers/transformer_hidream_image.py", line 429, in forward
ids = torch.cat((img_ids, txt_ids), dim=1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.
Batch Items:
- /root/notebooks/viddata/st/images/0082.png
- /root/notebooks/viddata/st/images/0085.png Error running job: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list.
======================================== Result:
- 0 completed jobs
- 1 failure
========================================
Traceback (most recent call last):
File "/root/tools/ai-toolkit/run.py", line 119, in
main() File "/root/tools/ai-toolkit/run.py", line 107, in main raise e File "/root/tools/ai-toolkit/run.py", line 95, in main job.run() File "/root/tools/ai-toolkit/jobs/ExtensionJob.py", line 22, in run process.run() File "/root/tools/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2035, in run raise e File "/root/tools/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2027, in run loss_dict = self.hook_train_loop(batch_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1515, in hook_train_loop loss = self.train_single_accumulation(batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1435, in train_single_accumulation noise_pred = self.predict_noise( ^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 768, in predict_noise return self.sd.predict_noise( ^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/toolkit/models/base_model.py", line 832, in predict_noise noise_pred = self.get_noise_prediction( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/hidream_model.py", line 371, in get_noise_prediction noise_pred = self.transformer( ^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/ostris/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/tools/ai-toolkit/extensions_built_in/diffusion_models/hidream/src/models/transformers/transformer_hidream_image.py", line 429, in forward ids = torch.cat((img_ids, txt_ids), dim=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list. hidream_stranger: 0%| | 0/2000 [00:01<?, ?it/s]
EDIT: This error comes from using a batch higher than 1. I have 80GB VRAM so wanted it to be slightly faster. EDIT2: This also happens when trying to input other resolutions above 1024, such as 1280 or 1536.
I have the same issue with any size image. This one is from 1536, 24gig Vram
RuntimeError: The expanded size of the tensor (4096) must match the existing size (9072) at non-singleton dimension 0. Target sizes: [4096, 3]. Tensor sizes: [9072, 3]
The primary error "RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 1 in the list." during torch.cat((img_ids, txt_ids), dim=1) indicates a batch size mismatch between img_ids and txt_ids. Specifically, txt_ids (tensor number 1) has a batch size of 2, while img_ids (tensor number 0) has a batch size of 1. This occurs when training with a batch size greater than 1, especially with non-square images, due to how img_ids was being created with a hardcoded batch size of 1 in HidreamModel.get_noise_prediction.
The secondary error, "RuntimeError: The expanded size of the tensor (4096) must match the existing size (9072) at non-singleton dimension 0", encountered with higher resolutions, points to a sequence length mismatch, likely because the number of image patches generated (pH * pW) exceeds the self.model.max_seq that the model's positional embeddings or padding logic expects.
D-Ogi - Great explanation, but how to fix the errors?
@Morphious12345, I'll try to make a PR with the dedicated fix. Unfortunately I can't promise any ETA.