annotated_deep_learning_paper_implementations issues

is the kld not in ppo total loss?

1

pandaupc

question

Where are the weights? Where is the .pt file for the trained model?

2

I don't see it anywhere. Is the code calling it by some external link in the code in this repo somewhere?

MotorCityCobra

question

Dimension of subsequent layers in Hypernetwork

3

Hi, I was reading through your implementation of HyperLSTM and the associated paper. I got lost in the shaping of the layers after the first layer. Could you please explain...

Simply-Adi

question

Request for Paper Implementation

1

# Title Request for Implementation of Mnemosyne: Learning to Train Transformers with Transformers in PyTorch # Description I would like to request the implementation of the "Mnemosyne: Learning to Train...

sozelfist

paper implementation

Crashed on labml_nn/neox/samples/finetune.py

2

I tried to run all .py files inside the samples folder. The generate.py and llm_int8.py files worked fine, however, the finetune.py crashed https://app.labml.ai/run/b97204eaa95611eda6ae9bc880f62bb5 with error: Traceback (most recent call last):...

Keith-Hon

StyleGAN2: ToRGB module with activation?

1

Hi, thanks for the nice annotated code! I looked at other implementation and they don't have activation in ToRGB module. Is this intended (or it is applied elsewhere and I...

Dao007forever

Missing activation for the time embedding inside ResidualBlock for DDPM?

3

In the DDPM Unet implementation, the residual blocks incorporate the time embedding by applying a linear layer only with no prior activation: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/b1f5c8e3a5f08bb195698b0410340b1dc2d8c821/labml_nn/diffusion/ddpm/unet.py#L130 However, the positionally encoded time embedding is...

EliasNehme

bug

StyleGAN2: Why don't you multiply path length penalty by the lazy regularization interval?

In the paper the StyleGAN2 code based on it is mentioned that when using lazy regularization technique, the regularization terms should be multiplied "by k to balance the overall magnitude...

yanisnotavocado

do you have code for BERT?

1

do you have code for BERT?

Sandy4321

Bug in Transformer-XL shift method

1

Hi! In the original paper implementation they are using dims `[1:]` : `x = x_padded[1:].view_as(x)` [their code](https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/mem_transformer.py#L201) but in your implementation you are using `[:-1]`: `x = x_padded[:-1].view_as(x)` [your code](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/xl/relative_mha.py#LL38C5-L38C33)...

Bearnardd

bug

annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard

Metadata

is the kld not in ppo total loss?

Where are the weights? Where is the .pt file for the trained model?

Dimension of subsequent layers in Hypernetwork

Request for Paper Implementation

Crashed on labml_nn/neox/samples/finetune.py

StyleGAN2: ToRGB module with activation?

Missing activation for the time embedding inside ResidualBlock for DDPM?

StyleGAN2: Why don't you multiply path length penalty by the lazy regularization interval?

do you have code for BERT?

Bug in Transformer-XL shift method

← Metadata

Owner

Metadata

annotated_deep_learning_paper_implementations annotated_deep_learning_paper_implementations copied to clipboard

Metadata

← Metadata

Owner

Metadata

annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard