[Feature] Full Training for vsa
Motivation
Can I perform full training for VSA rather than fine-tuning ? If so, how should I modify the scripts or training code? Looking forward for your reply.
Related resources
No response
Did you mean pre-training?
Did you mean pre-training?
yes, I don't want to perform fine-tuning on Wan- 1.3B. Instead, I want to train it from scratch. How can I achieve this?
It’s not supported yet, but we plan to add it in the future.
It’s not supported yet, but we plan to add it in the future.
can u give some ideas for how to pre-training of vsa? thanks a lot!
pretraining diffusion models has much higher requirements on data and requires images or starting from an image model checkpoint, perhaps the wan or hunyuanvideo tech report may be of use:
- https://arxiv.org/pdf/2503.20314
- https://arxiv.org/pdf/2412.03603