Hong Zhang comments

Results 21 comments of


                                            Hong Zhang

EntityControl /entity_transfer.py电商图生成

感谢您的建议，我们已经对entity_transfer.py进行了修改，不会再crop边角，而是直接resize，同时generate函数直接返回target_image。由于In-Context LoRA的输入限制，模型输入时还是必须是左右图拼接，因此我们是在generate函数中实现了这一逻辑。至于Prompt，经过测试，输出结果对entity_prompt没有那么敏感；但是global prompt还是必须按照In-Context LoRA的格式才能实现比较好的功能。参考：https://github.com/ali-vilab/In-Context-LoRA

diffusers fixed the bugs in qwen-image-edit. Do these bugs exist in diffsynth?

For the input image processing of qwen-image-edit, diffusers defaults to resizing the image to an area of 1024×1024 while preserving the aspect ratio. We have aligned with this behavior. Please...

diffusers fixed the bugs in qwen-image-edit. Do these bugs exist in diffsynth?

可以提供相应的案例吗

diffusers fixed the bugs in qwen-image-edit. Do these bugs exist in diffsynth?

@yinguoweiOvO 1. 经过测试，在初始噪声、目标图像的宽高相同的情况下，diffuers和diffsynth推理的结果基本一致。你这里不一样的原因是，diffusers默认会根据输入图像resize后的大小设置宽和高，所以你给出的diffuers生成的图大小为1088 × 960，输入图与目标图宽高一致，所以效果较好。而diffsynth会默认采用输入的宽高，你输入的宽高是1440 × 1280，而输入图大小被resize到了1088 × 960，所以宽高不一致，导致效果较差。如果也设置diffsynth的输入宽高为1088 × 960，就能得到类似的效果。 2. 我测试了diffuser手动输入宽高1440 × 1280的情况下，效果也是比较差的。 3. 两个库采用的初始噪声也不太相同，这个模型本身效果也有一定随机。 4. 最后：还是推荐输入和输出分辨率一致。如果不一致，或者有低分辨率输入，可以考虑：https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix 附：diffsynth的输入宽高为1088 × 960 ![Image](https://github.com/user-attachments/assets/cedfd5a1-fa6f-4e65-8632-a7011fc680a5)

can eligen supports sana model

The current version of the EliGen dataset and model are both generated and trained based on Flux, so Sana is not yet supported. We are currently evaluating the possibility of...

train scripts for eligen?

#369 We are working on it

qwen-image如何读取本地目录下的模型

用list[str]的方式传入path参数，例如： ``` pipe = QwenImagePipeline.from_pretrained( torch_dtype=torch.bfloat16, device="cuda", model_configs=[ ModelConfig(path=[ "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00001-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00002-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00003-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00004-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00005-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00006-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00007-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00008-of-00009.safetensors", "models/Qwen/Qwen-Image/transformer/diffusion_pytorch_model-00009-of-00009.safetensors", ]), ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="text_encoder/model*.safetensors"), ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="vae/diffusion_pytorch_model.safetensors"), ], tokenizer_config=ModelConfig(model_id="Qwen/Qwen-Image", origin_file_pattern="tokenizer/"), ) ```

Attention Map Visualization

The visualization results are derived from the regional attention (RA) layer at the last double-stream transformer block, showing the first ten denoising timesteps. Refer to the source code, we temporally...

Attention Map Visualization

In the implementation of Regional Attention, Local prompt tokens, global prompt tokens and latent tokens are concated together. So for the 5120 tokens, 0:512 are tokens from local prompt `person`,...

DiffSynth-Studio/Qwen-Image-EliGen-V2

All example datasets for DiffSynthStudio is placed in [DiffSynth-Studio/example_image_dataset](https://www.modelscope.cn/datasets/DiffSynth-Studio/example_image_dataset). The example for eligen is: ```json [ { "image": "eligen/image.png", "prompt": "A beautiful girl wearing shirt and shorts in the street,...