LyCORIS icon indicating copy to clipboard operation
LyCORIS copied to clipboard

LoHa/LoKr with conv. Error with tensor size mismatch

Open rockerBOO opened this issue 1 year ago • 1 comments

Was testing conv models and did one with LoHa and errored about a size mismatch. Using Kohya.

Commit: daa559fbd1e095d05619463d72f7f5ae0bd2a493 on dev branch

Relevant parts of the config:

pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
mixed_precision="fp16"
sdpa=true
network_dim=16
network_alpha=8
network_module = "lycoris.kohya"
network_args=[ 
  "algo=loha",
  "preset=unet-convblock-only",
  # "preset=unet-transformer-only", # Comparision without conv
  "dora_wd=true", # tested without and same error
  "rs_lora=true", # tested without and same error
  "dropout=0.5",
  "rank_dropout=0.25",
  "module_dropout=0.25"
]
Traceback (most recent call last):
  File "/mnt/900/builds/sd-scripts/train_network.py", line 1154, in <module>
    trainer.train(args)
  File "/mnt/900/builds/sd-scripts/train_network.py", line 896, in train
    noise_pred = self.call_unet(
  File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_conds).sample
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
    sample, res_samples = downsample_block(
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 261, in forward
    outputs = run_function(*args)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
    return module(*inputs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
    output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (20) must match the size of tensor b (16) at non-singleton dimension 3

Not a big deal as I'm generally not doing this beyond making a conv version of LoHa for analysis purposes.

Thank you!

rockerBOO avatar May 15 '24 19:05 rockerBOO

Same error type on LoKr, it seems.

pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
mixed_precision="fp16"
sdpa=true
network_dim=16
network_alpha=8
network_module = "lycoris.kohya"
network_args=[ 
  "algo=lokr",
  "preset=unet-convblock-only",
  # "preset=unet-transformer-only", # Comparision without conv
  "dora_wd=true", # tested without and same error
  "rs_lora=true", # tested without and same error
  "dropout=0.5",
  "rank_dropout=0.25",
  "module_dropout=0.25"
]
Traceback (most recent call last):
  File "/mnt/900/builds/sd-scripts/train_network.py", line 1115, in <module>
    trainer.train(args)
  File "/mnt/900/builds/sd-scripts/train_network.py", line 864, in train
    noise_pred = self.call_unet(
  File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_conds).sample
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
    sample, res_samples = downsample_block(
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 482, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 261, in forward
    outputs = run_function(*args)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
    return module(*inputs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
    output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (72) must match the size of tensor b (70) at non-singleton dimension 3

rockerBOO avatar May 15 '24 19:05 rockerBOO

@rockerBOO Does this problem still exist in latest dev? I totally reconstruct whole library recently

KohakuBlueleaf avatar May 31 '24 10:05 KohakuBlueleaf

I realized after that you were reconstructing. I tested it with commit 7880753fb7f624e0d03a349db3ecf66b92a15c3e and kohya dev commit https://github.com/kohya-ss/sd-scripts/commit/0d96e10b3e66d5c6c7096fbeb7626c5be2e98809 . I got the following:

Traceback (most recent call last):
  File "/mnt/900/builds/sd-scripts/train_network.py", line 1143, in <module>
    trainer.train(args)
  File "/mnt/900/builds/sd-scripts/train_network.py", line 887, in train
    noise_pred = self.call_unet(
  File "/mnt/900/builds/sd-scripts/train_network.py", line 126, in call_unet
    noise_pred = unet(noisy_latents, timesteps, text_conds).sample
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1589, in forward
    sample, res_samples = downsample_block(
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1018, in forward
    hidden_states = torch.utils.checkpoint.checkpoint(create_custom_forward(resnet), hidden_states, temb)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
    outputs = run_function(*args)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 1014, in custom_forward
    return module(*inputs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/900/builds/sd-scripts/library/original_unet.py", line 481, in forward
    output_tensor = input_tensor + hidden_states
RuntimeError: The size of tensor a (128) must match the size of tensor b (126) at non-singleton dimension 3
network_module = "lycoris.kohya"
network_args = [ 
  "algo=loha", 
  "preset=unet-convblock-only", 
  "dora_wd=true",
  "rs_lora=true",
  "dropout=0.3",
  "rank_dropout=0.15",
  "module_dropout=0.15",
]
pip list | grep lycoris
lycoris-lora              3.0.0.dev6   /mnt/900/builds/sd-scripts/LyCORIS

I can give a full config if you can't recreate. Thanks

rockerBOO avatar May 31 '24 18:05 rockerBOO

This is something bugs in kohya side The dim3 here is width

It could be some problems about bucket resolution steps

KohakuBlueleaf avatar Jun 02 '24 18:06 KohakuBlueleaf

Trying to figure out what could be causing it generally as it only happens on the dev version of this repo, and only for LoHa/LoKr with convolution. When using Kohya with conv it works but not sure how to further isolate where it could be causing this to happen differently in dev? I can poke to figure out the cause but any place to look for isolation?

Maybe I could list out the dimensions of my dataset files after processing? Maybe that would help indicate if it's abucket related issue? Or maybe I could make a dataset of just non-bucketed to compare. I will try to address some of these in a few days.

rockerBOO avatar Jun 02 '24 23:06 rockerBOO