DiffSynth-Studio
DiffSynth-Studio copied to clipboard
Enjoy the magic of Diffusion models!
您好,感谢您的工作,我有关于模型结构修改的两个问题: 1. 在原始Wan2.1-Fun-V1.1-1.3B-Control模型中,模型输入为reference_image(参考图像), control_video(控制视频),video(视频真值),现在我想将第一帧图像到视频生成模型和原模型结合起来,那么我在extra_inputs中还多添加了一个input_images,这样子在pipeline中会自动调用WanVideoUnit_ImageEmbedderVAE以更新y且不会改变模型结构,这样子是否能够训练一个新的模型 2. 我对wan_video_dit模型作出结构上的修改,比如多增加了一个动作输入,是否在self.keys_hash_with_shape_dict中添加新模型检查点的hashkey就可以完成新模型的注册和使用
before train  after train 
Hello, thank you for your work. May I ask, if I want to add a module to WanVideoUnit, such as adding an action embedding module and combining the action embedding...
旧版本框架代码支持wan2.2系列吗?
**Version: 1.1.8** I am encountering an issue where a LoRA trained on the **Wan2.1-I2V-14B-480P (Image-to-Video)** base model does not produce the expected video output. ### Successful Scenario (Working) * A...
我看到尽管```QwenImagePipeline.__call__```有把tiled, tile_size, tile_stride参数传入self.vae.decode,但```QwenImageVAE.decode```只是用**kwargs接收,并没有实际用到。 https://github.com/modelscope/DiffSynth-Studio/blob/afd101f3452c9ecae0c87b79adfa2e22d65ffdc3/diffsynth/pipelines/qwen_image.py#L444 https://github.com/modelscope/DiffSynth-Studio/blob/afd101f3452c9ecae0c87b79adfa2e22d65ffdc3/diffsynth/models/qwen_image_vae.py#L716 我对模型原理了解有限,想请教一下:请问是 功能暂未实现 还是 QwenImageVAE模型架构本身就无法支持tiled decoding?感谢您的时间和贡献!
Hi, Do you know how to load Qwen-Image-Lightning's acceleration lora in qwen-image-eidt-2509? In DiffSynth-Studio I loaded the acceleration lora and it shows 720 tensors are uploaded by LoRA, but the...
How much data and how many training steps are needed for qwen-image-edit lora?
请问modelscope中的AIGC的框架有做什么改造吗?请问loss值如何获取以及效果如何对齐
Hi, I used lora to finetune the Wan2.1 VACE1.3B model with the script `examples/wanvideo/model_training/lora/Wan2.1-VACE-1.3B.sh`. I used only one data pair and trained with all default parameters in the training script....