generative-models icon indicating copy to clipboard operation
generative-models copied to clipboard

How to train Stable Video Diffusion model?

Open howardgriffin opened this issue 8 months ago • 10 comments

howardgriffin avatar Dec 10 '23 03:12 howardgriffin

sorry,ask me

ersanliqiao avatar Dec 26 '23 09:12 ersanliqiao

sorry,ask me Hello! Do you have a training script for Stable Video Diffusion? Could you please contact me

Angelalilyer avatar Jan 08 '24 06:01 Angelalilyer

find a non official one #267

shijianjian avatar Jan 29 '24 22:01 shijianjian

Our team has released the SVD training script: https://github.com/mindspore-lab/mindone/tree/master/examples/svd It's still under development, but you can use it for reference.

hadipash avatar Mar 22 '24 08:03 hadipash

What are the GPU requirements for fine tuning @hadipash ?

bdytx5 avatar Mar 26 '24 20:03 bdytx5

What are the GPU requirements for fine tuning @hadipash ?

Currently a lot (64 GB), but we're working on reducing VRAM usage.

hadipash avatar Mar 27 '24 01:03 hadipash

@hadipash thats not bad really. You tested it with sharding across multiple GPU's (eg multiple GPU's with <64gb of VRAM)?

bdytx5 avatar Mar 27 '24 17:03 bdytx5

@hadipash thats not bad really. You tested it with sharding across multiple GPU's (eg multiple GPU's with <64gb of VRAM)?

Currently, a sequence of 4 frames can be trained on a single 64GB GPU. We are working on 1) optimizing memory usage on a single device and 2) implementing distributed training to allow for longer sequences (e.g., 30+ frames).

hadipash avatar Mar 28 '24 01:03 hadipash

@hadipash hello,I tried using LoRA to fine-tune the U-Net with SVD, and even with a batch size of 1, memory overflow occurs on the A100 GPU when the dataset consists of 25-frame videos. Does this mean that model parallel training must be employed, distributing the model parameters across multiple GPUs?

DataAIPlayer avatar May 08 '24 15:05 DataAIPlayer

@DataAIPlayer Not sure about LoRA, haven't integrated it yet. However, for vanilla training - yes, need distributed training as a single 64GB GPU can only fit 4 frames.

hadipash avatar May 09 '24 02:05 hadipash