lora Perhaps a simple fix for a docker container inside runpod.io?

Perhaps a simple fix for a docker container inside runpod.io?

Open gitreal2022 opened this issue 2 years ago • 1 comments

Hello,

I am running a docker container of SD 2.1, but cannot seem to run training for LORA. Here is the error I get when I try to run the default shell script in bash.

root@51235cb091e3:/workspace/stable-diffusion-webui/lora# 

**bash run_lora_db_w_text.sh** 

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Before training: Unet First Layer lora up tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
Before training: Unet First Layer lora down tensor([[ 2.7575e-02,  4.9739e-05,  3.9807e-02,  ..., -7.6583e-02,
         -3.2650e-03,  8.8336e-02],
        [-1.3945e-02,  3.5099e-02, -1.7838e-02,  ...,  1.0271e-03,
          1.0573e-02,  5.9847e-02],
        [-9.5399e-03,  4.8160e-02, -7.8387e-02,  ..., -6.7026e-02,
         -4.9318e-02, -1.3817e-02],
        [ 6.4708e-02, -7.2586e-02,  2.8864e-02,  ..., -1.0646e-01,
          2.2544e-02,  2.0882e-03]])
Before training: text encoder First Layer lora up tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        ...,
        [0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
Before training: text encoder First Layer lora down tensor([[-0.0550, -0.1452, -0.0761,  ...,  0.0092, -0.1626, -0.0285],
        [-0.0316, -0.0067,  0.0563,  ...,  0.0868, -0.0227,  0.0530],
        [ 0.0371,  0.0766, -0.0804,  ..., -0.0817,  0.0129,  0.0713],
        [-0.0288, -0.0431,  0.0423,  ..., -0.0268,  0.0986,  0.0533]])
/venv/lib/python3.10/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
***** Running training *****
  Num examples = 8
  Num batches each epoch = 8
  Num Epochs = 1250
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 10000
Steps:   0%|              | 4/10000 [00:17<11:40:33,  4.21s/it]Traceback (most recent call last):
  File "/workspace/stable-diffusion-webui/lora/train_lora_dreambooth.py", line 958, in <module>
    main(args)
  File "/workspace/stable-diffusion-webui/lora/train_lora_dreambooth.py", line 784, in main
    for step, batch in enumerate(train_dataloader):
  File "/venv/lib/python3.10/site-packages/accelerate/data_loader.py", line 383, in __iter__
    next_batch = next(dataloader_iter)
  File "/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/venv/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
IsADirectoryError: Caught IsADirectoryError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/workspace/stable-diffusion-webui/lora/train_lora_dreambooth.py", line 110, in __getitem__
    instance_image = Image.open(
  File "/venv/lib/python3.10/site-packages/PIL/Image.py", line 3092, in open
    fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/workspace/stable-diffusion-webui/lora/input/.ipynb_checkpoints'

Steps:   0%|              | 4/10000 [00:18<12:31:18,  4.51s/it]
Traceback (most recent call last):
  File "/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/venv/bin/python', 'train_lora_dreambooth.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--instance_data_dir=./input', '--output_dir=./output', '--instance_prompt=game character a22a', '--train_text_encoder', '--resolution=768', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=1e-4', '--learning_rate_text=5e-5', '--color_jitter', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=10000']' returned non-zero exit status 1.

Dec 13 '22 17:12 gitreal2022

It's looking for only images inside input folder but jupyter has made an .ipynb_checkpoints directory there. Just remove this directory and it should work.

!rm -rf input/.ipythnb_checkpoints

Dec 14 '22 08:12 rishabhjain

lora lora copied to clipboard

Perhaps a simple fix for a docker container inside runpod.io?

lora
lora copied to clipboard