lora icon indicating copy to clipboard operation
lora copied to clipboard

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

Open qunash opened this issue 2 years ago • 5 comments

First of all, congrats on the great work!

I got this error in the middle of training on T4 in a colab:

***** Running training *****
  Num examples = 16
  Num batches each epoch = 16
  Num Epochs = 938
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 15000
Steps:  53% 8000/15000 [1:06:51<59:27,  1.96it/s, loss=0.215, lr=0.0001]
Fetching 12 files: 100% 12/12 [00:00<00:00, 9088.42it/s]
Steps:  53% 8000/15000 [1:06:54<59:27,  1.96it/s, loss=0.831, lr=0.0001]Traceback (most recent call last):
  File "train_lora_dreambooth.py", line 964, in <module>
    main(args)
  File "train_lora_dreambooth.py", line 864, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py", line 375, in forward
    sample = self.conv_in(sample)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
Steps:  53% 8000/15000 [1:06:55<58:33,  1.99it/s, loss=0.831, lr=0.0001]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_lora_dreambooth.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--instance_data_dir=./data_example', '--output_dir=./output_example', '--instance_prompt=ghblx style', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=15000', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_checkpointing']' returned non-zero exit status 1.

Works fine with fewer steps. Not sure why this is happening.

qunash avatar Dec 09 '22 19:12 qunash

FYI you can work around this bug by either:

  • pulling my branch (https://github.com/timh/lora, branch "fix-incorrect-global-steps")
  • OR
  • downgrading accelerate, e.g., pip install accelerate==0.12.0

timh avatar Dec 09 '22 23:12 timh

Huh, it happens in the middle of the steps.....

cloneofsimo avatar Dec 10 '22 10:12 cloneofsimo

Thanks for the issue! Ive merged @timh 's branch so have a look if its still the case!

cloneofsimo avatar Dec 10 '22 10:12 cloneofsimo

@timh @cloneofsimo Thanks!

qunash avatar Dec 10 '22 15:12 qunash

I ran into the same error at inference time earlier this morning while running in Colab GPU (it was running fine in the past two days) since I started using a PyTorch Generator to specify the seed: I didn't see this open issue until now and supposed it was a side effect of my code change (I have used only random seeds for generation before), so I solved it doing autocasting:

device = "cuda"
generator = torch.Generator(device=device )
generator.manual_seed(SEED)

height = 512
width = 512
latents = torch.randn(
    (1, pipe.unet.in_channels, height // 8, width // 8),
    generator = generator,
    device = device 
)

tune_lora_scale(pipe.unet, LORA_SCALE)
with torch.autocast(device):
  image = pipe(INFERENCE_PROMPT, 
              num_inference_steps=50, 
              guidance_scale=GUIDANCE,
              latents=latents
              ).images[0]

virtualramblas avatar Dec 15 '22 17:12 virtualramblas