lora
lora copied to clipboard
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
First of all, congrats on the great work!
I got this error in the middle of training on T4 in a colab:
***** Running training *****
Num examples = 16
Num batches each epoch = 16
Num Epochs = 938
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 15000
Steps: 53% 8000/15000 [1:06:51<59:27, 1.96it/s, loss=0.215, lr=0.0001]
Fetching 12 files: 100% 12/12 [00:00<00:00, 9088.42it/s]
Steps: 53% 8000/15000 [1:06:54<59:27, 1.96it/s, loss=0.831, lr=0.0001]Traceback (most recent call last):
File "train_lora_dreambooth.py", line 964, in <module>
main(args)
File "train_lora_dreambooth.py", line 864, in main
model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py", line 375, in forward
sample = self.conv_in(sample)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same
Steps: 53% 8000/15000 [1:06:55<58:33, 1.99it/s, loss=0.831, lr=0.0001]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_lora_dreambooth.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--instance_data_dir=./data_example', '--output_dir=./output_example', '--instance_prompt=ghblx style', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=15000', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_checkpointing']' returned non-zero exit status 1.
Works fine with fewer steps. Not sure why this is happening.
FYI you can work around this bug by either:
- pulling my branch (https://github.com/timh/lora, branch "fix-incorrect-global-steps")
- OR
- downgrading accelerate, e.g.,
pip install accelerate==0.12.0
Huh, it happens in the middle of the steps.....
Thanks for the issue! Ive merged @timh 's branch so have a look if its still the case!
@timh @cloneofsimo Thanks!
I ran into the same error at inference time earlier this morning while running in Colab GPU (it was running fine in the past two days) since I started using a PyTorch Generator to specify the seed: I didn't see this open issue until now and supposed it was a side effect of my code change (I have used only random seeds for generation before), so I solved it doing autocasting:
device = "cuda"
generator = torch.Generator(device=device )
generator.manual_seed(SEED)
height = 512
width = 512
latents = torch.randn(
(1, pipe.unet.in_channels, height // 8, width // 8),
generator = generator,
device = device
)
tune_lora_scale(pipe.unet, LORA_SCALE)
with torch.autocast(device):
image = pipe(INFERENCE_PROMPT,
num_inference_steps=50,
guidance_scale=GUIDANCE,
latents=latents
).images[0]