annotated_deep_learning_paper_implementations icon indicating copy to clipboard operation
annotated_deep_learning_paper_implementations copied to clipboard

Missing activation for the time embedding inside ResidualBlock for DDPM?

Open EliasNehme opened this issue 2 years ago • 3 comments

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L130 However, the positionally encoded time embedding is already the result of a linear layer: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L80 Hence, both these layers collapse to a single linear layer with no non-linear mapping per residual block.

In the original tensorflow implementation by the author, the time embedding is first passed through a nonlinearity and only then through a linear layer: https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L49

EliasNehme avatar Jan 30 '23 10:01 EliasNehme

Yes you are correct, we have missed activation layer

vpj avatar Feb 02 '23 08:02 vpj

Beebug missing Moth

Siimarras avatar Mar 26 '23 09:03 Siimarras

Exactly.

Siimarras avatar Mar 26 '23 09:03 Siimarras