Open-Sora-Plan
Open-Sora-Plan copied to clipboard
[feat] Dataset embeddings/latents caching for more flexible experiments
Running VAEs and CLIP/T5 embedders is time expensive, and this cost scales up fast when multiple trainings are re-run.
As we keep these parts frozen and train only the diffusion model, we can decide to precompute them only once and store on drive in a form of raw tensors to be reused each training
See for a possible implementation
https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/train.py
https://github.com/ExponentialML/Text-To-Video-Finetuning/blob/main/utils/dataset.py
Good job, it is of benfit for training model with large dataset.