DiffSynth-Studio issues

Question regarding the noise scheduler and training objectives

1

Thank you for open source the code for Wan video! The quality is truly amazing. I really have fun using the model to generate all kinds of videos. And they...

yccyenchicheng

zero3保存模型通信错误

1

sft训练Wan2.1-T2V-14B，开启zero3，在accelerate.yaml中设置了zero3_save_16bit_model: true，存储模型通信卡死，根据日志发现卡死原因，rank 0使用op allgather_base，其他rank使用op allreduce 换用zero2可以正常工作这是deepspeed的bug吗，还是diffsynth模型需要适配？

MaxwellDing

https://github.com/modelscope/DiffSynth-Studio/blob/main/diffsynth/models/qwen_image_dit.py#L414 这里为什么把scale的数值设置成1000，是不是太大了，一般比如sd3只设置成默认的1啊？ https://github.com/modelscope/DiffSynth-Studio/blob/main/diffsynth/models/sd3_dit.py#L346

Jimzhou82sub

wan2.2 training 的时候，无法支持batch_size > 1 吗

3

这里会强制取第一个同时，unified_dataset的返回中是PIL.Image的list。

shinetzh

diffusers fixed the bugs in qwen-image-edit. Do these bugs exist in diffsynth?

11

@Artiprocher @mi804 https://github.com/huggingface/diffusers/pull/12190 https://github.com/huggingface/diffusers/pull/12188

akk-123

Question about Wan 2.2 I2V Training

6

When performing inference using the high-noise model and low-noise model with LoRA applied, the generated video exhibits sudden brightness changes (similar to turning off a light), and the video becomes...

DuanCB

edit_image 字段是填写什么内容？

5

训练qwen-image-edit.sh ，看到这个字段里 --data_file_keys "image,edit_image" \ edit_image 是要放什么内容？没找到相关说明。不写这个字段就会报错。

mihongyu

关于 Flux-Kontext 支持多张 kontext_images 输入的设计疑问

4

您好，感谢你们开源 DiffSynth-Studio 项目！我在阅读 [flux_image_new.py](https://github.com/modelscope/DiffSynth-Studio/blob/main/diffsynth/pipelines/flux_image_new.py) 中 FluxImageUnit_Kontext 的实现时注意到，当前代码支持输入多张 kontext_images，并将多个 kontext_latent 在非通道维度（dim=1）进行拼接： ```python3 kontext_latents = torch.concat(kontext_latents, dim=1) kontext_image_ids = torch.concat(kontext_image_ids, dim=-2) ``` 这似乎与 Flux-Kontext 官方开源实现（只支持单张图）不完全一致。在我们的测试中，尝试输入多张图像时，输出结果只是简单地将图像融合，整体表现并不协调。因此我有几个疑问想请教： 1. 当前支持多张 kontext_images 的设计初衷是什么？是为后续支持多图编辑、多图融合等扩展功能做准备吗？ 2....

Dshijie

qwen-image distill模型训练的一些问题

5

您好，我在阅读源码时有一些疑问，想请教一下：蒸馏模型在训练时是直接通过 SFT（supervised fine-tuning）把 CFG 蒸掉的吗？是否没有额外的蒸馏约束？比如类似 [DMD: Distilled Model Diffusion](https://arxiv.org/abs/2311.18828) 这样的蒸馏方法？我注意到在代码中，不论是「蒸馏模型」还是「非蒸馏模型」的训练流程，数据预处理后都没有使用 inputs_nega。那么对于非蒸馏模型的训练，是否不应该有一定概率 drop prompt，从而通过 inputs_nega 来生成图像？目前看起来 inputs_nega 并没有被利用。

1343744768

DiffSynth-Studio
DiffSynth-Studio copied to clipboard

Metadata

Question regarding the noise scheduler and training objectives

zero3保存模型通信错误

Adding tea cache wan2.2s2v

timestep的正余弦编码问题

wan2.2 training 的时候，无法支持batch_size > 1 吗

diffusers fixed the bugs in qwen-image-edit. Do these bugs exist in diffsynth?

Question about Wan 2.2 I2V Training

edit_image 字段是填写什么内容？

关于 Flux-Kontext 支持多张 kontext_images 输入的设计疑问

qwen-image distill模型训练的一些问题

← Metadata

Owner

Metadata

DiffSynth-Studio DiffSynth-Studio copied to clipboard

Metadata

← Metadata

Owner

Metadata

DiffSynth-Studio
DiffSynth-Studio copied to clipboard