Why different models are used for condition c and time t?

Open mccuba opened this issue 1 year ago • 1 comments

I'm trying to understand the design choices made in this code. Specifically, I'm wondering why condition C is passed through a random mask and then a linear layer, while time T is passed through an MLP. Why was this decision made? Why not use a linear layer instead of an MLP for time encoding? Or why not use an MLP for condition C and a linear layer for time? I'm having trouble understanding how these design decisions are made, and I would like to be able to make informed decisions myself when using similar models...

May 11 '24 20:05 mccuba

C's random mask [DURING TRAINING ONLY] is for learning classifier-free guidance. Please read the paper for more details.

May 24 '24 12:05 GuyTevet