ai-toolkit
ai-toolkit copied to clipboard
Training fails when switching from/to Prodigy optimaliser during training
This is for bugs only
Did you already ask in the discord?
Yes/No
You verified that this is a bug and not a feature request or question by asking in the discord?
Yes/No
Describe the bug
changing optimizer to Prodigy in Flux.2 it throws error
Traceback (most recent call last):
Traceback (most recent call last):
File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 120, in <module>
File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 120, in <module>
main()main()
File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 108, in main
File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 108, in main
raise eraise e
File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 96, in main
File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 96, in main
job.run()job.run()
File "D:\stability\Data\Packages\AI-Toolkit\jobs\ExtensionJob.py", line 22, in run
File "D:\stability\Data\Packages\AI-Toolkit\jobs\ExtensionJob.py", line 22, in run
process.run()process.run()
File "D:\stability\Data\Packages\AI-Toolkit\jobs\process\BaseSDTrainProcess.py", line 2162, in run
File "D:\stability\Data\Packages\AI-Toolkit\jobs\process\BaseSDTrainProcess.py", line 2162, in run
loss_dict = self.hook_train_loop(batch_list)loss_dict = self.hook_train_loop(batch_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\stability\Data\Packages\AI-Toolkit\extensions_built_in\sd_trainer\SDTrainer.py", line 2075, in hook_train_loop
File "D:\stability\Data\Packages\AI-Toolkit\extensions_built_in\sd_trainer\SDTrainer.py", line 2075, in hook_train_loop
self.optimizer.step()self.optimizer.step()
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\accelerate\optimizer.py", line 179, in step
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\accelerate\optimizer.py", line 179, in step
self.optimizer.step(closure)self.optimizer.step(closure)
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\lr_scheduler.py", line 124, in wrapper
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\lr_scheduler.py", line 124, in wrapper
return func.__get__(opt, opt.__class__)(*args, **kwargs)return func.__get__(opt, opt.__class__)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\optimizer.py", line 485, in wrapper
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\optimizer.py", line 485, in wrapper
out = func(*args, **kwargs)out = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\prodigyopt\prodigy.py", line 115, in step
File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\prodigyopt\prodigy.py", line 115, in step
use_bias_correction = group['use_bias_correction']use_bias_correction = group['use_bias_correction']
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyErrorKeyError: : 'use_bias_correction''use_bias_correction'
settings:
---
job: "extension"
config:
name: "plfun3-flux2_r32"
process:
- type: "diffusion_trainer"
training_folder: "D:\\stability\\Data\\Packages\\AI-Toolkit\\output"
sqlite_db_path: "./aitk_db.db"
device: "cuda"
trigger_word: null
performance_log_every: 10
network:
type: "lora"
linear: 32
linear_alpha: 32
conv: 16
conv_alpha: 16
lokr_full_rank: true
lokr_factor: -1
network_kwargs:
ignore_if_contains: []
log_dir: "output/plfun3-flux2_r32/tensorboard.tensoroard"
log_config:
log_every: 1
save:
dtype: "bf16"
save_every: 50
max_step_saves_to_keep: 600
save_format: "diffusers"
push_to_hub: false
datasets:
...
train:
batch_size: 1
bypass_guidance_embedding: false
steps: 20000
gradient_accumulation: 1
train_unet: true
train_text_encoder: false
gradient_checkpointing: true
noise_scheduler: "flowmatch"
optimizer: "Prodigy"
timestep_type: "weighted"
content_or_style: "content"
optimizer_params:
weight_decay: 0.0001
unload_text_encoder: false
cache_text_embeddings: true
lr: 0.0001
ema_config:
use_ema: false
ema_decay: 0.99
skip_first_sample: false
force_first_sample: false
disable_sampling: false
dtype: "bf16"
diff_output_preservation: false
diff_output_preservation_multiplier: 1
diff_output_preservation_class: "person"
switch_boundary_every: 1
loss_type: "mse"
model:
name_or_path: "black-forest-labs/FLUX.2-dev"
quantize: true
qtype: "qfloat8"
quantize_te: true
qtype_te: "qfloat8"
arch: "flux2"
low_vram: true
model_kwargs:
match_target_res: false
layer_offloading: true
layer_offloading_text_encoder_percent: 1
layer_offloading_transformer_percent: 0.9
sample:
...
meta:
name: "plfun3"
version: "1.0"