DiffSynth-Studio/Qwen-Image-EliGen-V2
Hi, could you clarify the required format for --dataset_metadata_path and how it matches with --data_file_keys? Should the JSON file be a list of objects with keys like "image" and "eligen_entity_masks" that directly correspond to the arguments?
--dataset_metadata_path data/example_image_dataset/metadata_eligen.json \
--data_file_keys "image,eligen_entity_masks" \
All example datasets for DiffSynthStudio is placed in DiffSynth-Studio/example_image_dataset. The example for eligen is:
[
{
"image": "eligen/image.png",
"prompt": "A beautiful girl wearing shirt and shorts in the street, holding a sign 'Entity Control'",
"eligen_entity_prompts": [
"A beautiful girl",
"sign 'Entity Control'",
"shorts",
"shirt"
],
"eligen_entity_masks":[
"eligen/0.png",
"eligen/1.png",
"eligen/2.png",
"eligen/3.png"
]
}
]
Hi, thanks for the clarification on the dataset format in the previous discussion. I now have another question: I would like to modify the Regional Attention mechanism in Qwen-Image-EliGen-V2, and I plan to implement this within the DiffSynth framework. Could you please advise which part/module of DiffSynth would be the most appropriate place to make these changes, or suggest a recommended approach?
Please refer to the following: Preprocess: https://github.com/modelscope/DiffSynth-Studio/blob/0d6de58af9269654c3d4ef30de5a12ad1527c826/diffsynth/pipelines/qwen_image.py#L594
Attention Mask Construction: https://github.com/modelscope/DiffSynth-Studio/blob/0d6de58af9269654c3d4ef30de5a12ad1527c826/diffsynth/models/qwen_image_dit.py#L434
Thank you so much! I have one more small question. I’d like to add an auxiliary loss — could you tell me where in the code I should modify or integrate it into Eligen’s original loss implementation?