sd-scripts
sd-scripts copied to clipboard
MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][1]['is_reg']
For some reason I cannot use is_reg parameter for DreamBooth type training.
I'm using the latest commit from dev branch.
My dataset config is the next:
[general]
shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1
# This is a DreamBooth-style dataset
[[datasets]]
resolution = [1024, 1280]
batch_size = 1
enable_bucket = true
bucket_no_upscale = true
[[datasets.subsets]]
image_dir = "/path/to/images/"
conditioning_data_dir = "/path/to/masks/"
num_repeats = 63
[[datasets.subsets]]
is_reg = true
image_dir = "/path/to/reg_images/"
conditioning_data_dir = "/path/to/reg_masks/"
cache_info = true
num_repeats = 1
When I hit "Start training" I get the following error:
WARNING clip_skip will be unexpected / sdxl_train_util.py:352
SDXL学習ではclip_skipは動作しません
2024-07-16 23:41:45 INFO prepare tokenizers sdxl_train_util.py:138
2024-07-16 23:41:46 INFO update token length: 75 sdxl_train_util.py:163
INFO Load dataset config from sdxl_train.py:133
/srv/shared/mirandakerrProject/SOURCE/KOHYA/combined_d
ataset_dreambooth.toml
WARNING ignore following options because config file is found: sdxl_train.py:137
train_data_dir, in_json /
設定ファイルが利用されるため以下のオプションは無視され
ます: train_data_dir, in_json
ERROR Invalid user config / config_util.py:373
ユーザ設定の形式が正しくないようです
Traceback (most recent call last):
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 948, in <module>
train(args)
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 169, in train
blueprint = blueprint_generator.generate(user_config, args, tokenizer=[tokenizer1, tokenizer2])
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/config_util.py", line 407, in generate
sanitized_user_config = self.sanitizer.sanitize_user_config(user_config)
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/config_util.py", line 370, in sanitize_user_config
return self.user_config_validator(user_config)
File "/srv/shared/AI/LoraTraining/kohya_ss/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 272, in __call__
return self._compiled([], data)
File "/srv/shared/AI/LoraTraining/kohya_ss/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 595, in validate_dict
return base_validate(path, iteritems(data), out)
File "/srv/shared/AI/LoraTraining/kohya_ss/venv/lib/python3.10/site-packages/voluptuous/schema_builder.py", line 433, in validate_mapping
raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][1]['is_reg']
E0716 23:41:50.868000 130445689476160 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 200042) of binary: /srv/shared/AI/LoraTraining/kohya_ss/venv/bin/python
However, if I remove is_reg option and hit "Start training" I get the following error:
INFO 11395 train images with repeating. train_util.py:1678
INFO 0 reg images. train_util.py:1681
WARNING no regularization images / train_util.py:1686
正則化画像が見つかりませんでした
Traceback (most recent call last):
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 948, in <module>
train(args)
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/sdxl_train.py", line 170, in train
train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group)
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/config_util.py", line 487, in generate_dataset_group_by_blueprint
dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params))
File "/srv/shared/AI/LoraTraining/kohya_ss/sd-scripts/library/train_util.py", line 2038, in __init__
len(missing_imgs) == 0
AssertionError: missing conditioning data for 5662 images / 制御用画像が見つかりませんでした: ['s2_000000001', 's2_000000002', 's2_000000003', 's2_000000004', 's2_000000005', 's2_000000006', 's2_000000007', 's2_000000008', 's2_000000009', 's2_000000010', 's2_000000011', 's2_000000012', 's2_000000013', 's2_000000014', 's2_000000015', 's2_000000016', 's2_000000017', 's2_000000018',
...
I can't figure out why is_reg parameter is not supported. Any help is really appreciated!
Sorry. Wrong repository. Opened it in the GUI issues: https://github.com/bmaltais/kohya_ss/issues/2647
Actually I realized that the issue comes from sdxl_train.py which is sd-scripts. So I believe the issue was opened correctly. Thus, reopening it.
conditioning_data_dir cannot be used with is_reg. (Due to architectural issues
conditioning_data_dir cannot be used with is_reg. (Due to architectural issues
Thank you for the information. However, if I remove conditioning_data_dir from the second subset (DreamBooth) then it fails with the exception about the first subset that conditioning_data_dir is not expected.
This is a bit strange because the documentation says that conditioning_data_dir should work with DreamBooth approach, but maybe I'm missing something: https://github.com/kohya-ss/sd-scripts/blob/main/docs/train_lllite_README.md#preparing-the-dataset
Thank you for reporting the issue. I think you are using masked loss with the dataset with conditioning_data_dir. Unfortunately, conditioning_data_dir is not supported with is_reg option. I will update the documentation.
As a workaround, please set the number of repeats to the dataset to balance the number of images for each dataset.
Thank you for reporting the issue. I think you are using masked loss with the dataset with
conditioning_data_dir. Unfortunately,conditioning_data_diris not supported withis_regoption. I will update the documentation.As a workaround, please set the number of repeats to the dataset to balance the number of images for each dataset.
Do you mean conditioning_data_dir doesn't work in the subset where is_reg is used ([[datasets.subsets]]) or do you mean conditioning_data_dir doesn't work in all subsets if at least one subset contains is_reg option?
In other words, is this a valid configuration?
[general]
shuffle_caption = false
caption_extension = ".txt"
keep_tokens = 1
[[datasets]]
resolution = [1024, 1280]
batch_size = 1
enable_bucket = true
bucket_no_upscale = true
[[datasets.subsets]]
image_dir = "/path/to/images/"
conditioning_data_dir = "/path/to/masks/"
num_repeats = 63
[[datasets.subsets]]
is_reg = true
image_dir = "/path/to/reg_images/"
cache_info = true
num_repeats = 1
conditioning_data_dirdoesn't work in all subsets if at least one subset containsis_regoption
This appears to be the case. I guess it makes sense that you wouldn't want to use masked training images with unmasked regularization images?