FastVideo icon indicating copy to clipboard operation
FastVideo copied to clipboard

[Feature] Full Training for vsa

Open clytze0216 opened this issue 5 months ago • 5 comments

Motivation

Can I perform full training for VSA rather than fine-tuning ? If so, how should I modify the scripts or training code? Looking forward for your reply.

Related resources

No response

clytze0216 avatar Jul 14 '25 13:07 clytze0216

Did you mean pre-training?

BrianChen1129 avatar Jul 14 '25 21:07 BrianChen1129

Did you mean pre-training?

yes, I don't want to perform fine-tuning on Wan- 1.3B. Instead, I want to train it from scratch. How can I achieve this?

clytze0216 avatar Jul 15 '25 02:07 clytze0216

It’s not supported yet, but we plan to add it in the future.

BrianChen1129 avatar Jul 15 '25 09:07 BrianChen1129

It’s not supported yet, but we plan to add it in the future.

can u give some ideas for how to pre-training of vsa? thanks a lot!

clytze0216 avatar Jul 17 '25 03:07 clytze0216

pretraining diffusion models has much higher requirements on data and requires images or starting from an image model checkpoint, perhaps the wan or hunyuanvideo tech report may be of use:

  • https://arxiv.org/pdf/2503.20314
  • https://arxiv.org/pdf/2412.03603

SolitaryThinker avatar Aug 29 '25 04:08 SolitaryThinker