annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard
🧑🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gan...
I don't see it anywhere. Is the code calling it by some external link in the code in this repo somewhere?
Hi, I was reading through your implementation of HyperLSTM and the associated paper. I got lost in the shaping of the layers after the first layer. Could you please explain...
# Title Request for Implementation of Mnemosyne: Learning to Train Transformers with Transformers in PyTorch # Description I would like to request the implementation of the "Mnemosyne: Learning to Train...
I tried to run all .py files inside the samples folder. The generate.py and llm_int8.py files worked fine, however, the finetune.py crashed https://app.labml.ai/run/b97204eaa95611eda6ae9bc880f62bb5 with error: Traceback (most recent call last):...
Hi, thanks for the nice annotated code! I looked at other implementation and they don't have activation in ToRGB module. Is this intended (or it is applied elsewhere and I...
In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L130 However, the positionally encoded time embedding is...
In the paper the StyleGAN2 code based on it is mentioned that when using lazy regularization technique, the regularization terms should be multiplied "by k to balance the overall magnitude...
do you have code for BERT?
Hi! In the original paper implementation they are using dims `[1:]` : `x = x_padded[1:].view_as(x)` [their code](https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/mem_transformer.py#L201) but in your implementation you are using `[:-1]`: `x = x_padded[:-1].view_as(x)` [your code](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/xl/relative_mha.py#LL38C5-L38C33)...