lora
lora copied to clipboard
Perhaps a simple fix for a docker container inside runpod.io?
Hello,
I am running a docker container of SD 2.1, but cannot seem to run training for LORA. Here is the error I get when I try to run the default shell script in bash.
root@51235cb091e3:/workspace/stable-diffusion-webui/lora#
**bash run_lora_db_w_text.sh**
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Before training: Unet First Layer lora up tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
...,
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Before training: Unet First Layer lora down tensor([[ 2.7575e-02, 4.9739e-05, 3.9807e-02, ..., -7.6583e-02,
-3.2650e-03, 8.8336e-02],
[-1.3945e-02, 3.5099e-02, -1.7838e-02, ..., 1.0271e-03,
1.0573e-02, 5.9847e-02],
[-9.5399e-03, 4.8160e-02, -7.8387e-02, ..., -6.7026e-02,
-4.9318e-02, -1.3817e-02],
[ 6.4708e-02, -7.2586e-02, 2.8864e-02, ..., -1.0646e-01,
2.2544e-02, 2.0882e-03]])
Before training: text encoder First Layer lora up tensor([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
...,
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Before training: text encoder First Layer lora down tensor([[-0.0550, -0.1452, -0.0761, ..., 0.0092, -0.1626, -0.0285],
[-0.0316, -0.0067, 0.0563, ..., 0.0868, -0.0227, 0.0530],
[ 0.0371, 0.0766, -0.0804, ..., -0.0817, 0.0129, 0.0713],
[-0.0288, -0.0431, 0.0423, ..., -0.0268, 0.0986, 0.0533]])
/venv/lib/python3.10/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
***** Running training *****
Num examples = 8
Num batches each epoch = 8
Num Epochs = 1250
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 10000
Steps: 0%| | 4/10000 [00:17<11:40:33, 4.21s/it]Traceback (most recent call last):
File "/workspace/stable-diffusion-webui/lora/train_lora_dreambooth.py", line 958, in <module>
main(args)
File "/workspace/stable-diffusion-webui/lora/train_lora_dreambooth.py", line 784, in main
for step, batch in enumerate(train_dataloader):
File "/venv/lib/python3.10/site-packages/accelerate/data_loader.py", line 383, in __iter__
next_batch = next(dataloader_iter)
File "/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/venv/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
raise exception
IsADirectoryError: Caught IsADirectoryError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/venv/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/venv/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/workspace/stable-diffusion-webui/lora/train_lora_dreambooth.py", line 110, in __getitem__
instance_image = Image.open(
File "/venv/lib/python3.10/site-packages/PIL/Image.py", line 3092, in open
fp = builtins.open(filename, "rb")
IsADirectoryError: [Errno 21] Is a directory: '/workspace/stable-diffusion-webui/lora/input/.ipynb_checkpoints'
Steps: 0%| | 4/10000 [00:18<12:31:18, 4.51s/it]
Traceback (most recent call last):
File "/venv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/venv/bin/python', 'train_lora_dreambooth.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--instance_data_dir=./input', '--output_dir=./output', '--instance_prompt=game character a22a', '--train_text_encoder', '--resolution=768', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=1e-4', '--learning_rate_text=5e-5', '--color_jitter', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=10000']' returned non-zero exit status 1.
It's looking for only images inside input folder
but jupyter has made an .ipynb_checkpoints
directory there. Just remove this directory and it should work.
!rm -rf input/.ipythnb_checkpoints