Qwen Image Edit Plus Training: Missing control images for QwenImageEditPlusModelMissing control images for QwenImageEditPlusModel
This is for bugs only
Did you already ask in the discord?
yes. I see this issue has been raised many times. Do help.
You verified that this is a bug and not a feature request or question by asking in the discord?
No. Most likely a bug though
Describe the bug
Training a qwen image edit plus LoRA on a H100. Although I have given one target and 2 control datasets it still outputs an error saying
Running 1 job
{
"type": "diffusion_trainer",
"training_folder": "/home/ubuntu/ai-toolkit/output",
"sqlite_db_path": "/home/ubuntu/ai-toolkit/aitk_db.db",
"device": "cuda",
"trigger_word": null,
"performance_log_every": 10,
"network": {
"type": "lora",
"linear": 32,
"linear_alpha": 32,
"conv": 16,
"conv_alpha": 16,
"lokr_full_rank": true,
"lokr_factor": -1,
"network_kwargs": {
"ignore_if_contains": []
}
},
"save": {
"dtype": "bf16",
"save_every": 250,
"max_step_saves_to_keep": 4,
"save_format": "diffusers",
"push_to_hub": false
},
"datasets": [
{
"folder_path": "/home/ubuntu/ai-toolkit/datasets/watch_target",
"mask_path": null,
"mask_min_value": 0.1,
"default_caption": "put this watch on their wrist",
"caption_ext": "txt",
"caption_dropout_rate": 0.05,
"cache_latents_to_disk": false,
"is_reg": false,
"network_weight": 1,
"resolution": [
512,
768,
1024
],
"controls": [],
"shrink_video_to_frames": true,
"num_frames": 1,
"do_i2v": true,
"flip_x": false,
"flip_y": false,
"control_path_1": "/home/ubuntu/ai-toolkit/datasets/watch_control",
"control_path_2": "/home/ubuntu/ai-toolkit/datasets/watch_product",
"control_path_3": "/home/ubuntu/ai-toolkit/datasets/watch_product"
}
],
"train": {
"batch_size": 1,
"bypass_guidance_embedding": false,
"steps": 5000,
"gradient_accumulation": 1,
"train_unet": true,
"train_text_encoder": false,
"gradient_checkpointing": true,
"noise_scheduler": "flowmatch",
"optimizer": "adamw8bit",
"timestep_type": "weighted",
"content_or_style": "balanced",
"optimizer_params": {
"weight_decay": 0.0001
},
"unload_text_encoder": false,
"cache_text_embeddings": false,
"lr": 0.0001,
"ema_config": {
"use_ema": false,
"ema_decay": 0.99
},
"skip_first_sample": false,
"force_first_sample": false,
"disable_sampling": false,
"dtype": "bf16",
"diff_output_preservation": false,
"diff_output_preservation_multiplier": 1,
"diff_output_preservation_class": "person",
"switch_boundary_every": 1,
"loss_type": "mse"
},
"model": {
"name_or_path": "Qwen/Qwen-Image-Edit-2509",
"quantize": false,
"qtype": "qfloat8",
"quantize_te": false,
"qtype_te": "qfloat8",
"arch": "qwen_image_edit_plus",
"low_vram": true,
"model_kwargs": {
"match_target_res": true
},
"layer_offloading": false,
"layer_offloading_text_encoder_percent": 1,
"layer_offloading_transformer_percent": 1
},
"sample": {
"sampler": "flowmatch",
"sample_every": 250,
"width": 750,
"height": 1000,
"samples": [
{
"prompt": "put this watch on their wrist",
"ctrl_img_1": "/home/ubuntu/ai-toolkit/data/images/c4fd0513-b8e4-434f-9831-2f74b6ffd137.png",
"ctrl_img_2": "/home/ubuntu/ai-toolkit/data/images/b14e86b6-3f08-4d8e-9375-20988c37b556.png"
},
{
"prompt": "put this watch on their wrist",
"ctrl_img_1": "/home/ubuntu/ai-toolkit/data/images/9fb6cf5b-1fb2-435f-b899-d59727b34806.png",
"ctrl_img_2": "/home/ubuntu/ai-toolkit/data/images/d7b0fc30-8331-40c3-b08d-2229941b9018.png"
}
],
"neg": "",
"seed": 42,
"walk_seed": true,
"guidance_scale": 4,
"sample_steps": 25,
"num_frames": 1,
"fps": 1
}
}
Using SQLite database at /home/ubuntu/ai-toolkit/aitk_db.db
Job ID: "ed6b93f3-75d9-4a59-8a75-5173ae43a9df"
#############################################
Running job: qwen_image_edit_2509_watch_tryon
#############################################
Running 1 process
Loading Qwen Image model
Loading transformer
Loading checkpoint shards: 100%|##########| 5/5 [00:00<00:00, 31.18it/s]
Moving transformer to CPU
Text Encoder
Loading checkpoint shards: 100%|##########| 4/4 [00:00<00:00, 51.85it/s]
Loading VAE
Making pipe
Preparing Model
Model Loaded
create LoRA network. base dim (rank): 32, alpha: 32
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
apply LoRA to Conv2d with kernel size (3,3). dim (rank): 16, alpha: 16
create LoRA for Text Encoder: 0 modules.
create LoRA for U-Net: 840 modules.
enable LoRA for U-Net
Dataset: /home/ubuntu/ai-toolkit/datasets/watch_target
- Preprocessing image dimensions
100%|##########| 30/30 [00:00<00:00, 19295.99it/s]
- Found 30 images
Bucket sizes for /home/ubuntu/ai-toolkit/datasets/watch_target:
416x576: 30 files
1 buckets made
Dataset: /home/ubuntu/ai-toolkit/datasets/watch_target
- Preprocessing image dimensions
100%|##########| 30/30 [00:00<00:00, 22270.64it/s]
- Found 30 images
Bucket sizes for /home/ubuntu/ai-toolkit/datasets/watch_target:
672x864: 30 files
1 buckets made
Dataset: /home/ubuntu/ai-toolkit/datasets/watch_target
- Preprocessing image dimensions
100%|##########| 30/30 [00:00<00:00, 18371.90it/s]
- Found 30 images
Bucket sizes for /home/ubuntu/ai-toolkit/datasets/watch_target:
736x960: 30 files
1 buckets made
Generating baseline samples before training
qwen_image_edit_2509_watch_tryon: 0%| | 0/5000 [00:00<?, ?it/s]Error running job: Missing control images for QwenImageEditPlusModel
========================================
Result:
-
0 completed jobs
-
1 failure
========================================
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/ubuntu/ai-toolkit/run.py", line 120, in
File "/home/ubuntu/ai-toolkit/run.py", line 120, in
main()main()
File "/home/ubuntu/ai-toolkit/run.py", line 108, in main
File "/home/ubuntu/ai-toolkit/run.py", line 108, in main
raise eraise e
File "/home/ubuntu/ai-toolkit/run.py", line 96, in main
File "/home/ubuntu/ai-toolkit/run.py", line 96, in main
job.run()job.run()
File "/home/ubuntu/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
File "/home/ubuntu/ai-toolkit/jobs/ExtensionJob.py", line 22, in run
process.run()process.run()
File "/home/ubuntu/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2162, in run
File "/home/ubuntu/ai-toolkit/jobs/process/BaseSDTrainProcess.py", line 2162, in run
loss_dict = self.hook_train_loop(batch_list)loss_dict = self.hook_train_loop(batch_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 2055, in hook_train_loop
File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 2055, in hook_train_loop
loss = self.train_single_accumulation(batch)loss = self.train_single_accumulation(batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1562, in train_single_accumulation
File "/home/ubuntu/ai-toolkit/extensions_built_in/sd_trainer/SDTrainer.py", line 1562, in train_single_accumulation
conditional_embeds = self.sd.encode_prompt(conditional_embeds = self.sd.encode_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/ai-toolkit/toolkit/models/base_model.py", line 1069, in encode_prompt
File "/home/ubuntu/ai-toolkit/toolkit/models/base_model.py", line 1069, in encode_prompt
return self.get_prompt_embeds(prompt, control_images=control_images)return self.get_prompt_embeds(prompt, control_images=control_images)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/ai-toolkit/extensions_built_in/diffusion_models/qwen_image/qwen_image_edit_plus.py", line 170, in get_prompt_embeds
File "/home/ubuntu/ai-toolkit/extensions_built_in/diffusion_models/qwen_image/qwen_image_edit_plus.py", line 170, in get_prompt_embeds
raise ValueError("Missing control images for QwenImageEditPlusModel")raise ValueError("Missing control images for QwenImageEditPlusModel")
ValueErrorValueError: : Missing control images for QwenImageEditPlusModelMissing control images for QwenImageEditPlusModel
qwen_image_edit_2509_watch_tryon: 0%| | 0/5000 [00:01<?, ?it/s]
I fixed this by enabling offload 100% of the text encoder to the gpu. However this shows that there still is an issue. Since I am using an H100 the model should comfortably train on it and not throw an OOM. This might be a bad coupling between the offloading feature and qwen image edit?? Initially it was disabled which caused it not to work.
Please correct me if I am wrong about the OOM.
Regards
Hi. The toggle is definitely the issue. I turned the toggle on and set both values to 0 and it works(should ideally be same as the toggle being off). Just letting you know in case the info is useful.
Related issues:
- https://github.com/ostris/ai-toolkit/issues/472
- https://github.com/ostris/ai-toolkit/issues/441
If you just disable caching of the text embeddings it gets you past this error, but then causes the errors in issue #441.
@jsermeno I faced another issue when i disabled text embeddings caching, I will test it out again before concluding
I was able to figure out my issue. It looks like it can have multiple root causes. I have a job that successfully finished. I turned on text embeddings caching, but figured out that datasets can only have lowercase extensions. This line exists that defines the allowed extensions:
img_ext_list = ['.jpg', '.jpeg', '.png', '.webp']
Since I had several images that came from a camera, where uppercase extensions are common ('JPG', 'JPEG', etc), my control images were not being processed. The are several places in the code repeat this logic of only processing control images with lower case extensions.
I'd be happy to submit a pull request to fix this if the maintainer would like to accept case-insensitive control images. Can do this by checking if the exact filename in the target dataset exists. Right now, the logic only uses the base filename from the target file, but then appends a lowercase extension.