annotated-transformer issues

Epoch Training: Help

1

Not sure, what is wrong? Any suggestions /usr/local/lib/python3.7/dist-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead. warnings.warn(warning.format(ret)) /usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:20: UserWarning: nn.init.xavier_uniform is now deprecated in favor of...

ludwigwittgenstein2

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

8

I ran the code on Google colab. When _building German vocabulary_ here: ``` if is_interactive_notebook(): # global variables used later in the script spacy_de, spacy_en = show_example(load_tokenizers) vocab_src, vocab_tgt =...

tiassap

MultiHeadedAttention: affine transforms

First of all: thank you for this work, it is really easy to follow along this notebook. My question is the following: In the MultiHeadedAttention class, you instantiate 4 affine...

axelbr

some questions about MultiHeadAtttention

class MultiHeadedAttention(nn.Module): def __init__(self, h, d_model, dropout=0.1): "Take in model size and number of heads." super(MultiHeadedAttention, self).__init__() assert d_model % h == 0 # We assume d_v always equals d_k...

SteveBetter

About Shared Embedding

2

``` if False: model.src_embed[0].lut.weight = model.tgt_embeddings[0].lut.weight model.generator.lut.weight = model.tgt_embed[0].lut.weight ``` Hi, I can't find `tgt_embeddings` in your code. Maybe it is `model.src_embed[0].lut.weight = model.tgt_embed[0].lut.weight`. And if shared Embedding, Should the...

dawei-yu

Fix two small typos

Thanks for the great resource!

mcognetta

Some doubts about SublayerConnection

6

According to what you wrote： _“That is, the output of each sub-layer is $\mathrm{LayerNorm}(x + \mathrm{Sublayer}(x))$, where $\mathrm{Sublayer}(x)$ is the function implemented by the sub-layer itself. We apply dropout [(cite)](http://jmlr.org/papers/v15/srivastava14a.html)...

watersounds

Typo in Multihead-attention:

2

In the MultiheadAttention the line `self.linears = clones(nn.Linear(d_model, d_model), 4)` occurs, but it should be a 3 instead of a 4: `self.linears = clones(nn.Linear(d_model, d_model), 3)` am I correct?

Jostarndt

use biased estimate of std in layernorm as in the original paper

The original [paper](https://arxiv.org/pdf/1607.06450.pdf) computes a biased estimate of sample standard deviation. However, by default, `torch.Tensor.std()` uses an unbiased estimate [Ref](https://pytorch.org/docs/1.11/generated/torch.Tensor.std.html?highlight=torch%20std#torch.Tensor.std). Therefore, it is necessary to use `torch.Tensor.std(-1,unbiased=False)`. Moreover, the class...

Arunprakash-A

No need for a generator in the EncoderDecoder class

3

Hi, Great notebook! Just wanted to mention that there is no need to pass the `generator` in the constructor of the `EncoderDecoder` class. It makes it a bit confusing as...

mkserge

annotated-transformer
annotated-transformer copied to clipboard

Metadata

Epoch Training: Help

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

MultiHeadedAttention: affine transforms

some questions about MultiHeadAtttention

About Shared Embedding

Fix two small typos

Some doubts about SublayerConnection

Typo in Multihead-attention:

use biased estimate of std in layernorm as in the original paper

No need for a generator in the EncoderDecoder class

← Metadata

Owner

Metadata

annotated-transformer annotated-transformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

annotated-transformer
annotated-transformer copied to clipboard