FastVideo [Feature] More Training details for Wan2.2 14B MoE self-forcing distillation

Motivation

Hi there!

Great work on the Wan2.2 14B MoE self-forcing distillation – the results are impressive!

I’m curious if you have any plans to share more training details, especially regarding how you addressed the memory challenges. In particular, I’d love to learn about your solutions to the KV cache backward memory spike issue during training.

Thanks a lot for your contributions and looking forward to your insights!

Related resources

No response

Nov 20 '25 02:11 Bensong0506

Our current training recipe is not good enough yet, so we're still trying to improve it by trying different settings. But we'll release it once we have converged on a final recipe.

As for the KV cache memory spike, unfortunately we haven't addressed it yet, as we're currently focusing on the quality first. But we do have plans to address it using the trick mentioned in krea's blog: https://www.krea.ai/blog/krea-realtime-14b#

Nov 21 '25 08:11 JerryZhou54

Our current training recipe is not good enough yet, so we're still trying to improve it by trying different settings. But we'll release it once we have converged on a final recipe.

As for the KV cache memory spike, unfortunately we haven't addressed it yet, as we're currently focusing on the quality first. But we do have plans to address it using the trick mentioned in krea's blog: https://www.krea.ai/blog/krea-realtime-14b#

What do you think about the direct forcing trick here? https://arxiv.org/html/2510.01784v2 Have you tried similar approaches?

Nov 21 '25 22:11 tonyabracadabra