ai-toolkit icon indicating copy to clipboard operation
ai-toolkit copied to clipboard

Qwen Image Edit Plus Training: Missing control images for QwenImageEditPlusModelMissing control images for QwenImageEditPlusModel

Open Colin-Antony opened this issue 1 month ago • 5 comments

This is for bugs only

Did you already ask in the discord?

yes. I see this issue has been raised many times. Do help.

You verified that this is a bug and not a feature request or question by asking in the discord?

No. Most likely a bug though

Describe the bug

Training a qwen image edit plus LoRA on a H100. Although I have given one target and 2 control datasets it still outputs an error saying

Running 1 job

{

"type": "diffusion_trainer",

"training_folder": "/home/ubuntu/ai-toolkit/output",

"sqlite_db_path": "/home/ubuntu/ai-toolkit/aitk_db.db",

"device": "cuda",

"trigger_word": null,

"performance_log_every": 10,

"network": {

    "type": "lora",

    "linear": 32,

    "linear_alpha": 32,

    "conv": 16,

    "conv_alpha": 16,

    "lokr_full_rank": true,

    "lokr_factor": -1,

    "network_kwargs": {

        "ignore_if_contains": []

    }

},

"save": {

    "dtype": "bf16",

    "save_every": 250,

    "max_step_saves_to_keep": 4,

    "save_format": "diffusers",

    "push_to_hub": false

},

"datasets": [

    {

        "folder_path": "/home/ubuntu/ai-toolkit/datasets/watch_target",

        "mask_path": null,

        "mask_min_value": 0.1,

        "default_caption": "put this watch on their wrist",

        "caption_ext": "txt",

        "caption_dropout_rate": 0.05,

        "cache_latents_to_disk": false,

        "is_reg": false,

        "network_weight": 1,

        "resolution": [

            512,

            768,

            1024

        ],

        "controls": [],

        "shrink_video_to_frames": true,

        "num_frames": 1,

        "do_i2v": true,

        "flip_x": false,

        "flip_y": false,

        "control_path_1": "/home/ubuntu/ai-toolkit/datasets/watch_control",

        "control_path_2": "/home/ubuntu/ai-toolkit/datasets/watch_product",

        "control_path_3": "/home/ubuntu/ai-toolkit/datasets/watch_product"

    }

],

"train": {

    "batch_size": 1,

    "bypass_guidance_embedding": false,

    "steps": 5000,

    "gradient_accumulation": 1,

    "train_unet": true,

    "train_text_encoder": false,

    "gradient_checkpointing": true,

    "noise_scheduler": "flowmatch",

    "optimizer": "adamw8bit",

    "timestep_type": "weighted",

    "content_or_style": "balanced",

    "optimizer_params": {

        "weight_decay": 0.0001

    },

    "unload_text_encoder": false,

    "cache_text_embeddings": false,

    "lr": 0.0001,

    "ema_config": {

        "use_ema": false,

        "ema_decay": 0.99

    },

    "skip_first_sample": false,

    "force_first_sample": false,

    "disable_sampling": false,

    "dtype": "bf16",

    "diff_output_preservation": false,

    "diff_output_preservation_multiplier": 1,

    "diff_output_preservation_class": "person",

    "switch_boundary_every": 1,

    "loss_type": "mse"

},

"model": {

    "name_or_path": "Qwen/Qwen-Image-Edit-2509",

    "quantize": false,

    "qtype": "qfloat8",

    "quantize_te": false,

    "qtype_te": "qfloat8",

    "arch": "qwen_image_edit_plus",

    "low_vram": true,

    "model_kwargs": {

        "match_target_res": true

    },

    "layer_offloading": false,

    "layer_offloading_text_encoder_percent": 1,

    "layer_offloading_transformer_percent": 1

},

"sample": {

    "sampler": "flowmatch",

    "sample_every": 250,

    "width": 750,

    "height": 1000,

    "samples": [

        {

            "prompt": "put this watch on their wrist",

            "ctrl_img_1": "/home/ubuntu/ai-toolkit/data/images/c4fd0513-b8e4-434f-9831-2f74b6ffd137.png",

            "ctrl_img_2": "/home/ubuntu/ai-toolkit/data/images/b14e86b6-3f08-4d8e-9375-20988c37b556.png"

        },

        {

            "prompt": "put this watch on their wrist",

            "ctrl_img_1": "/home/ubuntu/ai-toolkit/data/images/9fb6cf5b-1fb2-435f-b899-d59727b34806.png",

            "ctrl_img_2": "/home/ubuntu/ai-toolkit/data/images/d7b0fc30-8331-40c3-b08d-2229941b9018.png"

        }

    ],

    "neg": "",

    "seed": 42,

    "walk_seed": true,

    "guidance_scale": 4,

    "sample_steps": 25,

    "num_frames": 1,

    "fps": 1

}

}

Using SQLite database at /home/ubuntu/ai-toolkit/aitk_db.db

Job ID: "ed6b93f3-75d9-4a59-8a75-5173ae43a9df"

#############################################

Running job: qwen_image_edit_2509_watch_tryon

#############################################

Running 1 process

Loading Qwen Image model

Loading transformer

Loading checkpoint shards: 100%|##########| 5/5 [00:00<00:00, 31.18it/s]

Moving transformer to CPU

Text Encoder

Loading checkpoint shards: 100%|##########| 4/4 [00:00<00:00, 51.85it/s]

Loading VAE

Making pipe

Preparing Model

Model Loaded

create LoRA network. base dim (rank): 32, alpha: 32

neuron dropout: p=None, rank dropout: p=None, module dropout: p=None

apply LoRA to Conv2d with kernel size (3,3). dim (rank): 16, alpha: 16

create LoRA for Text Encoder: 0 modules.

create LoRA for U-Net: 840 modules.

enable LoRA for U-Net

Dataset: /home/ubuntu/ai-toolkit/datasets/watch_target

  • Preprocessing image dimensions

100%|##########| 30/30 [00:00<00:00, 19295.99it/s]

  • Found 30 images

Bucket sizes for /home/ubuntu/ai-toolkit/datasets/watch_target:

416x576: 30 files

1 buckets made

Dataset: /home/ubuntu/ai-toolkit/datasets/watch_target

  • Preprocessing image dimensions

100%|##########| 30/30 [00:00<00:00, 22270.64it/s]

  • Found 30 images

Bucket sizes for /home/ubuntu/ai-toolkit/datasets/watch_target:

672x864: 30 files

1 buckets made

Dataset: /home/ubuntu/ai-toolkit/datasets/watch_target

  • Preprocessing image dimensions

100%|##########| 30/30 [00:00<00:00, 18371.90it/s]

  • Found 30 images

Bucket sizes for /home/ubuntu/ai-toolkit/datasets/watch_target:

736x960: 30 files

1 buckets made

Generating baseline samples before training

qwen_image_edit_2509_watch_tryon: 0%| | 0/5000 [00:00<?, ?it/s]Error running job: Missing control images for QwenImageEditPlusModel

========================================

Result:

  • 0 completed jobs

  • 1 failure

========================================

Traceback (most recent call last):

Traceback (most recent call last):

File "/home/ubuntu/ai-toolkit/run.py", line 120, in

File "/home/ubuntu/ai-toolkit/run.py", line 120, in

    main()main()

File "/home/ubuntu/ai-toolkit/run.py", line 108, in main

File "/home/ubuntu/ai-toolkit/run.py", line 108, in main

    raise eraise e

File "/home/ubuntu/ai-toolkit/run.py", line 96, in main

File "/home/ubuntu/ai-toolkit/run.py", line 96, in main

    job.run()job.run()

File "/home/ubuntu/ai-toolkit/jobs/ExtensionJob.py", line 22, in run

File "/home/ubuntu/ai-toolkit/jobs/ExtensionJob.py", line 22, in run

    process.run()process.run()

File "/home/ubuntu/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2162, in run

File "/home/ubuntu/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2162, in run

    loss_dict = self.hook_train_loop(batch_list)loss_dict = self.hook_train_loop(batch_list)

                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 2055, in hook_train_loop

File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 2055, in hook_train_loop

    loss = self.train_single_accumulation(batch)loss = self.train_single_accumulation(batch)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1562, in train_single_accumulation

File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1562, in train_single_accumulation

    conditional_embeds = self.sd.encode_prompt(conditional_embeds = self.sd.encode_prompt(

                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/ubuntu/ai-toolkit/toolkit/models/base_model.py", line 1069, in encode_prompt

File "/home/ubuntu/ai-toolkit/toolkit/models/base_model.py", line 1069, in encode_prompt

    return self.get_prompt_embeds(prompt, control_images=control_images)return self.get_prompt_embeds(prompt, control_images=control_images)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/ubuntu/ai-toolkit/extensions_built_in/diffusion_models/qwen_image/qwen_image_edit_plus.py", line 170, in get_prompt_embeds

File "/home/ubuntu/ai-toolkit/extensions_built_in/diffusion_models/qwen_image/qwen_image_edit_plus.py", line 170, in get_prompt_embeds

    raise ValueError("Missing control images for QwenImageEditPlusModel")raise ValueError("Missing control images for QwenImageEditPlusModel")

ValueErrorValueError: : Missing control images for QwenImageEditPlusModelMissing control images for QwenImageEditPlusModel

qwen_image_edit_2509_watch_tryon: 0%| | 0/5000 [00:01<?, ?it/s]

Colin-Antony avatar Nov 21 '25 16:11 Colin-Antony

I fixed this by enabling offload 100% of the text encoder to the gpu. However this shows that there still is an issue. Since I am using an H100 the model should comfortably train on it and not throw an OOM. This might be a bad coupling between the offloading feature and qwen image edit?? Initially it was disabled which caused it not to work.

Please correct me if I am wrong about the OOM.

Regards

Colin-Antony avatar Nov 21 '25 17:11 Colin-Antony

Hi. The toggle is definitely the issue. I turned the toggle on and set both values to 0 and it works(should ideally be same as the toggle being off). Just letting you know in case the info is useful.

Colin-Antony avatar Nov 21 '25 17:11 Colin-Antony

Related issues:

  • https://github.com/ostris/ai-toolkit/issues/472
  • https://github.com/ostris/ai-toolkit/issues/441

If you just disable caching of the text embeddings it gets you past this error, but then causes the errors in issue #441.

jsermeno avatar Nov 22 '25 01:11 jsermeno

@jsermeno I faced another issue when i disabled text embeddings caching, I will test it out again before concluding

Colin-Antony avatar Nov 22 '25 06:11 Colin-Antony

I was able to figure out my issue. It looks like it can have multiple root causes. I have a job that successfully finished. I turned on text embeddings caching, but figured out that datasets can only have lowercase extensions. This line exists that defines the allowed extensions:

img_ext_list = ['.jpg', '.jpeg', '.png', '.webp']

Since I had several images that came from a camera, where uppercase extensions are common ('JPG', 'JPEG', etc), my control images were not being processed. The are several places in the code repeat this logic of only processing control images with lower case extensions.

I'd be happy to submit a pull request to fix this if the maintainer would like to accept case-insensitive control images. Can do this by checking if the exact filename in the target dataset exists. Right now, the logic only uses the base filename from the target file, but then appends a lowercase extension.

jsermeno avatar Nov 22 '25 13:11 jsermeno