motion-diffusion-model
motion-diffusion-model copied to clipboard
Why different models are used for condition c and time t?
I'm trying to understand the design choices made in this code. Specifically, I'm wondering why condition C is passed through a random mask and then a linear layer, while time T is passed through an MLP. Why was this decision made? Why not use a linear layer instead of an MLP for time encoding? Or why not use an MLP for condition C and a linear layer for time? I'm having trouble understanding how these design decisions are made, and I would like to be able to make informed decisions myself when using similar models...
C's random mask [DURING TRAINING ONLY] is for learning classifier-free guidance. Please read the paper for more details.