DiffSynth-Studio diffusers fixed the bugs in qwen-image-edit. Do these bugs exist in diffsynth?

@Artiprocher @mi804 https://github.com/huggingface/diffusers/pull/12190 https://github.com/huggingface/diffusers/pull/12188

Aug 20 '25 01:08 akk-123

For the input image processing of qwen-image-edit, diffusers defaults to resizing the image to an area of 1024×1024 while preserving the aspect ratio. We have aligned with this behavior.
Please refer to: https://github.com/modelscope/DiffSynth-Studio/blob/9ec06523393526ceceee7f2528001507289664b8/examples/qwen_image/model_inference/Qwen-Image-Edit.py#L21

Meanwhile, we also support preserving the original resolution of the input image. In fact, we found that the optimal editing performance is achieved when the input resolution matches the resolution of the generated image.
Please refer to: https://github.com/modelscope/DiffSynth-Studio/blob/9ec06523393526ceceee7f2528001507289664b8/examples/qwen_image/model_inference/Qwen-Image-Edit.py#L25

Aug 20 '25 05:08 mi804

更新代码之后，将edit_image_auto_resize设置为True，diffsynth仓库代码对于人物一致性保存的不如diffusers好，且生图会发黄。如果将edit_image_auto_resize设置为False，则提示词遵循能力变差。请问能否排查一下此问题

Aug 20 '25 08:08 yinguoweiOvO

可以提供相应的案例吗

Aug 20 '25 08:08 mi804

我使用能够分享的图像看是否能够复现，可能要晚点才能发出来

Aug 20 '25 09:08 yinguoweiOvO

以上三张图像分别是原图，diffsynth编辑生成、diffusers编辑生成，提示词为“男生和女生在海边拥抱在一起。保持人物一致性”。虽然diffusers也有一定变化，但貌似diffsynth变化更大。当然，一些其他case diffsynth能够成功，但有些case diffusers能够保持较好一致性，而diffsynth风格变化较大

Aug 20 '25 10:08 yinguoweiOvO

不好意思，我的问题，diffusers生成图像时没有指定图像分辨率，看来该模型效果跟图像分辨率强相关

Aug 20 '25 12:08 yinguoweiOvO

@yinguoweiOvO

经过测试，在初始噪声、目标图像的宽高相同的情况下，diffuers和diffsynth推理的结果基本一致。你这里不一样的原因是，diffusers默认会根据输入图像resize后的大小设置宽和高，所以你给出的diffuers生成的图大小为1088 × 960，输入图与目标图宽高一致，所以效果较好。而diffsynth会默认采用输入的宽高，你输入的宽高是1440 × 1280，而输入图大小被resize到了1088 × 960，所以宽高不一致，导致效果较差。如果也设置diffsynth的输入宽高为1088 × 960，就能得到类似的效果。
我测试了diffuser手动输入宽高1440 × 1280的情况下，效果也是比较差的。
两个库采用的初始噪声也不太相同，这个模型本身效果也有一定随机。
最后：还是推荐输入和输出分辨率一致。如果不一致，或者有低分辨率输入，可以考虑：https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix

附：diffsynth的输入宽高为1088 × 960

Aug 20 '25 12:08 mi804

我刚刚也发现了这个问题，将生成图像的宽高设置为条件图像resize后的宽高得到了和diffusers相近的效果，感谢您的回复 : )

Aug 20 '25 12:08 yinguoweiOvO

diffusers quality still very low tested on H200 just 1 hour ago

Presets shared here : https://www.patreon.com/posts/114517862

also preparing a new tutorial

my preset 12 steps

swarmui + comfyui my presets 50 steps

official code from Hugging Face with latest diffusers

original image

prompt is : change hair color to blue

Aug 20 '25 12:08 FurkanGozukara

@yinguoweiOvO

经过测试，在初始噪声、目标图像的宽高相同的情况下，diffuers和diffsynth推理的结果基本一致。你这里不一样的原因是，diffusers默认会根据输入图像resize后的大小设置宽和高，所以你给出的diffuers生成的图大小为1088 × 960，输入图与目标图宽高一致，所以效果较好。而diffsynth会默认采用输入的宽高，你输入的宽高是1440 × 1280，而输入图大小被resize到了1088 × 960，所以宽高不一致，导致效果较差。如果也设置diffsynth的输入宽高为1088 × 960，就能得到类似的效果。

我测试了diffuser手动输入宽高1440 × 1280的情况下，效果也是比较差的。

两个库采用的初始噪声也不太相同，这个模型本身效果也有一定随机。

最后：还是推荐输入和输出分辨率一致。如果不一致，或者有低分辨率输入，可以考虑：https://www.modelscope.cn/models/DiffSynth-Studio/Qwen-Image-Edit-Lowres-Fix

附：diffsynth的输入宽高为1088 × 960

增加一个参数，根据输入图resize后的宽高自动设置生成图片的宽高，更方便一些吧

Aug 21 '25 08:08 akk-123

大家好，我这里最近在迁移一个蒸馏的lora到diffsynth_studio上(https://github.com/ModelTC/Qwen-Image-Lightning)，现在权重key值匹配问题已经解决了，但是发现diffuser推理的脚本上好像还改了scheduler的配置，然后diffuser和diffsynth_studio的flowmatching scheduler实现不一样，想请教下大家有遇到过类似问题么？

Oct 28 '25 02:10 TumCCC