ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

Training fails when switching from/to Prodigy optimaliser during training

Open mcDandy opened this issue 1 week ago • 0 comments

This is for bugs only

Did you already ask in the discord?

Yes/No

You verified that this is a bug and not a feature request or question by asking in the discord?

Yes/No

Describe the bug

changing optimizer to Prodigy in Flux.2 it throws error

Traceback (most recent call last):
Traceback (most recent call last):
  File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 120, in <module>
  File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 120, in <module>
        main()main()
  File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 108, in main
  File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 108, in main
        raise eraise e
  File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 96, in main
  File "D:\stability\Data\Packages\AI-Toolkit\run.py", line 96, in main
        job.run()job.run()
  File "D:\stability\Data\Packages\AI-Toolkit\jobs\ExtensionJob.py", line 22, in run
  File "D:\stability\Data\Packages\AI-Toolkit\jobs\ExtensionJob.py", line 22, in run
        process.run()process.run()
  File "D:\stability\Data\Packages\AI-Toolkit\jobs\process\BaseSDTrainProcess.py", line 2162, in run
  File "D:\stability\Data\Packages\AI-Toolkit\jobs\process\BaseSDTrainProcess.py", line 2162, in run
        loss_dict = self.hook_train_loop(batch_list)loss_dict = self.hook_train_loop(batch_list)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stability\Data\Packages\AI-Toolkit\extensions_built_in\sd_trainer\SDTrainer.py", line 2075, in hook_train_loop
  File "D:\stability\Data\Packages\AI-Toolkit\extensions_built_in\sd_trainer\SDTrainer.py", line 2075, in hook_train_loop
        self.optimizer.step()self.optimizer.step()
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\accelerate\optimizer.py", line 179, in step
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\accelerate\optimizer.py", line 179, in step
        self.optimizer.step(closure)self.optimizer.step(closure)
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\lr_scheduler.py", line 124, in wrapper
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\lr_scheduler.py", line 124, in wrapper
        return func.__get__(opt, opt.__class__)(*args, **kwargs)return func.__get__(opt, opt.__class__)(*args, **kwargs)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\optimizer.py", line 485, in wrapper
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\torch\optim\optimizer.py", line 485, in wrapper
        out = func(*args, **kwargs)out = func(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\prodigyopt\prodigy.py", line 115, in step
  File "D:\stability\Data\Packages\AI-Toolkit\venv\Lib\site-packages\prodigyopt\prodigy.py", line 115, in step
        use_bias_correction = group['use_bias_correction']use_bias_correction = group['use_bias_correction']
                                                    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyErrorKeyError: : 'use_bias_correction''use_bias_correction'

settings:

---
job: "extension"
config:
  name: "plfun3-flux2_r32"
  process:
    - type: "diffusion_trainer"
      training_folder: "D:\\stability\\Data\\Packages\\AI-Toolkit\\output"
      sqlite_db_path: "./aitk_db.db"
      device: "cuda"
      trigger_word: null
      performance_log_every: 10
      network:
        type: "lora"
        linear: 32
        linear_alpha: 32
        conv: 16
        conv_alpha: 16
        lokr_full_rank: true
        lokr_factor: -1
        network_kwargs:
          ignore_if_contains: []
        log_dir: "output/plfun3-flux2_r32/tensorboard.tensoroard"
        log_config:
          log_every: 1
      save:
        dtype: "bf16"
        save_every: 50
        max_step_saves_to_keep: 600
        save_format: "diffusers"
        push_to_hub: false
      datasets:
        ...
      train:
        batch_size: 1
        bypass_guidance_embedding: false
        steps: 20000
        gradient_accumulation: 1
        train_unet: true
        train_text_encoder: false
        gradient_checkpointing: true
        noise_scheduler: "flowmatch"
        optimizer: "Prodigy"
        timestep_type: "weighted"
        content_or_style: "content"
        optimizer_params:
          weight_decay: 0.0001
        unload_text_encoder: false
        cache_text_embeddings: true
        lr: 0.0001
        ema_config:
          use_ema: false
          ema_decay: 0.99
        skip_first_sample: false
        force_first_sample: false
        disable_sampling: false
        dtype: "bf16"
        diff_output_preservation: false
        diff_output_preservation_multiplier: 1
        diff_output_preservation_class: "person"
        switch_boundary_every: 1
        loss_type: "mse"
      model:
        name_or_path: "black-forest-labs/FLUX.2-dev"
        quantize: true
        qtype: "qfloat8"
        quantize_te: true
        qtype_te: "qfloat8"
        arch: "flux2"
        low_vram: true
        model_kwargs:
          match_target_res: false
        layer_offloading: true
        layer_offloading_text_encoder_percent: 1
        layer_offloading_transformer_percent: 0.9
      sample:
...
meta:
  name: "plfun3"
  version: "1.0"

mcDandy avatar Dec 12 '25 10:12 mcDandy