LoHa/LoKr with conv. Error with tensor size mismatch
Was testing conv models and did one with LoHa and errored about a size mismatch. Using Kohya.
Commit: daa559fbd1e095d05619463d72f7f5ae0bd2a493 on dev branch
Relevant parts of the config:
pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
mixed_precision="fp16"
sdpa=true
network_dim=16
network_alpha=8
network_module = "lycoris.kohya"
network_args=[
"algo=loha",
"preset=unet-convblock-only",
# "preset=unet-transformer-only", # Comparision without conv
"dora_wd=true", # tested without and same error
"rs_lora=true", # tested without and same error
"dropout=0.5",
"rank_dropout=0.25",
"module_dropout=0.25"
]
Traceback (most recent call last):
File "/mnt/900/builds/sd-scripts/train_network.py", line 1154, in <module>
trainer.train(args)
File "/mnt/900/builds/sd-scripts/train_network.py", line 896, in train
noise_pred = self.call_unet(
File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
noise_pred = unet(noisy_latents, timesteps, text_conds).sample
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
sample, res_samples = downsample_block(
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 261, in forward
outputs = run_function(*args)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
return module(*inputs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (20) must match the size of tensor b (16) at non-singleton dimension 3
Not a big deal as I'm generally not doing this beyond making a conv version of LoHa for analysis purposes.
Thank you!
Same error type on LoKr, it seems.
pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
mixed_precision="fp16"
sdpa=true
network_dim=16
network_alpha=8
network_module = "lycoris.kohya"
network_args=[
"algo=lokr",
"preset=unet-convblock-only",
# "preset=unet-transformer-only", # Comparision without conv
"dora_wd=true", # tested without and same error
"rs_lora=true", # tested without and same error
"dropout=0.5",
"rank_dropout=0.25",
"module_dropout=0.25"
]
Traceback (most recent call last):
File "/mnt/900/builds/sd-scripts/train_network.py", line 1115, in <module>
trainer.train(args)
File "/mnt/900/builds/sd-scripts/train_network.py", line 864, in train
noise_pred = self.call_unet(
File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
noise_pred = unet(noisy_latents, timesteps, text_conds).sample
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
sample, res_samples = downsample_block(
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 261, in forward
outputs = run_function(*args)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
return module(*inputs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (72) must match the size of tensor b (70) at non-singleton dimension 3
@rockerBOO Does this problem still exist in latest dev? I totally reconstruct whole library recently
I realized after that you were reconstructing. I tested it with commit 7880753fb7f624e0d03a349db3ecf66b92a15c3e and kohya dev commit https://github.com/kohya-ss/sd-scripts/commit/0d96e10b3e66d5c6c7096fbeb7626c5be2e98809 . I got the following:
Traceback (most recent call last):
File "/mnt/900/builds/sd-scripts/train_network.py", line 1143, in <module>
trainer.train(args)
File "/mnt/900/builds/sd-scripts/train_network.py", line 887, in train
noise_pred = self.call_unet(
File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
noise_pred = unet(noisy_latents, timesteps, text_conds).sample
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
sample, res_samples = downsample_block(
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
outputs = run_function(*args)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
return module(*inputs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (128) must match the size of tensor b (126) at non-singleton dimension 3
network_module = "lycoris.kohya"
network_args = [
"algo=loha",
"preset=unet-convblock-only",
"dora_wd=true",
"rs_lora=true",
"dropout=0.3",
"rank_dropout=0.15",
"module_dropout=0.15",
]
pip list | grep lycoris
lycoris-lora 3.0.0.dev6 /mnt/900/builds/sd-scripts/LyCORIS
I can give a full config if you can't recreate. Thanks
This is something bugs in kohya side The dim3 here is width
It could be some problems about bucket resolution steps
Trying to figure out what could be causing it generally as it only happens on the dev version of this repo, and only for LoHa/LoKr with convolution. When using Kohya with conv it works but not sure how to further isolate where it could be causing this to happen differently in dev? I can poke to figure out the cause but any place to look for isolation?
Maybe I could list out the dimensions of my dataset files after processing? Maybe that would help indicate if it's abucket related issue? Or maybe I could make a dataset of just non-bucketed to compare. I will try to address some of these in a few days.