Open-Sora-Plan icon indicating copy to clipboard operation
Open-Sora-Plan copied to clipboard

⭐ [Feature] Support deepspeed training for DiT

Open clownrat6 opened this issue 4 months ago • 1 comments

Changed

  • Code Style:
    • Rewrite dit modeling, split dit.py to modeling_dit.py and configuration_dit.py
  • Accelerate w/ Deepspeed training: Support training "dit" on accelerate w/ deepspeed.

How to Train

  1. Accelerate w/ zero2 training:
bash scripts/sky/train_256_dsz2_dit.sh

Specifically, we attempt to use v100 for training latte:

  1. use zero2 training (bs = 5):
Memory cost

dit_mem

Loss curve

Dingtalk_20240312212951

Tests

Training latte on accelerate w/ deepspeed zero 2 (bs 5 * 8, num_frames 16, sample_rate 3, 19500 step):

Click to expand

https://github.com/PKU-YuanGroup/Open-Sora-Plan/assets/58427300/d0ad508c-729a-47b6-9571-c823aff7badb

clownrat6 avatar Mar 12 '24 13:03 clownrat6

@sennnnn will you mind mirroring this to Open(MM)DiT? https://github.com/NUS-HPC-AI-Lab/OpenDiT/

kabachuha avatar Mar 12 '24 14:03 kabachuha

@sennnnn will you mind mirroring this to Open(MM)DiT? https://github.com/NUS-HPC-AI-Lab/OpenDiT/

Thanks for your advice. I will complete it later.

clownrat6 avatar Mar 13 '24 02:03 clownrat6