generative-models
generative-models copied to clipboard
Is anyone trying to train their own SVD-based Camera Motion LoRA model?
I tried using LoRA to fine-tune the U-Net with SVD, and even with a batch size of 1, memory overflow occurs on the A100-80G GPU when the dataset consists of 25-frame videos. And I tried using DeepSpeed, but it was ineffective. Does this mean that model parallel training must be employed, distributing the model parameters across multiple GPUs?
Why don't you lower your image resolution?? With batch size of 1, 512 x 512 x 25 frames runs for A6000 which has 48G of VRAM.
I am experimenting with fine-tuning motion LoRA and need to generate videos at a resolution of 1024x576. Do you mean that training motion LoRA at a lower resolution can achieve camera control effects when inferring at a higher resolution?
Oh if you must get that resolution, this might not help :( Using xformers and gradient checkpointing might help
Is there an open source SVD fine-tuning method?
https://github.com/alibaba/animate-anything/blob/main/train_svd.py