diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

train_dreambooth.py not working

Open the-machine-preacher opened this issue 3 years ago • 2 comments

I ran the Google Colab doc using the training datasets provided, but every time I came to the the !accelerate launch train_dreambooth.py step, it failed, with a ZeroDivisionError error. I can't seem to circumvent this, could someone please help?

This is the full error message:

Traceback (most recent call last):
  File "train_dreambooth.py", line 822, in <module>
    main(args)
  File "train_dreambooth.py", line 613, in main
    for batch in tqdm(train_dataloader, desc="Caching latents"):
  File "/usr/local/lib/python3.8/dist-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "train_dreambooth.py", line 322, in __getitem__
    instance_path, instance_prompt = self.instance_images_path[index % self.num_instance_images]
ZeroDivisionError: integer division or modulo by zero
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=stabilityai/sd-vae-ft-mse', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/zwx', '--revision=fp16', '--with_prior_preservation', '--prior_loss_weight=1.0', '--seed=1337', '--resolution=512', '--train_batch_size=1', '--train_text_encoder', '--mixed_precision=fp16', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=50', '--sample_batch_size=4', '--max_train_steps=800', '--save_interval=10000', '--save_sample_prompt=photo of zwx dog', '--concepts_list=concepts_list.json']' returned non-zero exit status 1.

the-machine-preacher avatar Dec 06 '22 23:12 the-machine-preacher

Try actually adding images.

Cyberes avatar Dec 09 '22 18:12 Cyberes

It worked for me to upload the images via upload option in the code rather than putting them into the folder. But I'm not sure about the core issue here.

sercanerhan avatar Dec 21 '22 19:12 sercanerhan