SRDiffusion icon indicating copy to clipboard operation
SRDiffusion copied to clipboard

Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation

1National University of Singapore,  2Sun Yat-sen University,  3Alibaba Group
(† Corresponding authors.)

arXiv License GitHub Repo stars

figure1

Introduction

SRDiffusion is a novel video diffusion inference framework that reduces computation costs through Sketching-Rendering Cooperation between large and small models:

  • The large model handles high-noise steps, preserving semantic and motion fidelity (Sketching).

  • The small model processes low-noise steps to refine visual details (Rendering).

SRDiffusion achieves up to 3× speedup on Wan and 2× speedup on CogVideoX, with minimal quality degradation. And it is complementary to other optimization techniques.

SRDiffusion for Wan 2.1

  1. Follow the Wan2.1 setup guide to set up the environment and download the 14B and 1.3B models.

  2. Run the following command to generate a video:

    python ./wan/srd_generate.py --task t2v-14B --size 832*480 \
        --ckpt_dir ./Wan2.1-T2V-14B --rendering_ckpt_dir ./Wan2.1-T2V-1.3B \
        --offload_model True --t5_cpu \
        --self_diff_threshold 0.01 \
        --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
    

SRDiffusion for CogVideoX

  1. Follow the CogVideo setup guide to prepare the environment.

  2. Run the following command to generate a video:

    python cli_demo_srd.py --seed 42 --num_frames 49 --fps 8 \
        --self_diff_threshold 0.01 \
        --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
    

Acknowledgement

This repository is built upon Wan2.1 and CogVideoX. We sincerely thank the contributors of these projects for their excellent work.

Contributing

For contribution guidelines, please refer to CONTRIBUTING.md.

Contributors

SRDiffusion is developed by Alibaba Group and NUS HPC-AI Lab. This work is supported by Alibaba Innovative Research(AIR).

License

SRDiffusion is licensed under the Apache License (Version 2.0). See the LICENSE file for more details. This project also includes third-party test cases released under other open-source licenses. Please refer to the NOTICE file for more information.

Citation

If you find SRDiffusion useful for you, please consider starring the project ⭐ and citing it using the following BibTeX entry:

@misc{cheng2025srdiffusion,
      title={SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation}, 
      author={Shenggan Cheng and Yuanxin Wei and Lansong Diao and Yong Liu and Bujiao Chen and Lianghua Huang and Yu Liu and Wenyuan Yu and Jiangsu Du and Wei Lin and Yang You},
      year={2025},
      eprint={2505.19151},
      archivePrefix={arXiv},
      primaryClass={cs.GR},
      url={https://arxiv.org/abs/2505.19151}, 
}