annotated_deep_learning_paper_implementations Missing activation for the time embedding inside ResidualBlock for DDPM?

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L130 However, the positionally encoded time embedding is already the result of a linear layer: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L80 Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block.

In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer: https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49

Jan 30 '23 10:01 EliasNehme

Yes you are correct, we have missed activation layer

Feb 02 '23 08:02 vpj

Beebug missing Moth

Mar 26 '23 09:03 Siimarras

Exactly.

Mar 26 '23 09:03 Siimarras