transformer issues

Reporting a bug during test time

@hyunwoongko thanks for your nice implementation. By the way, I wanna point out an issue. If you notice, while testing, you are utilising the following code: ```python def test_model(num_examples): iterator...

mediadepp

[Bug] Dropout should comes before residual connection and layer norm

In section 5.4 of the [original paper](https://arxiv.org/pdf/1706.03762.pdf): > We apply dropout to the output of each sub-layer, before it is added to the sub-layer input and normalized.

ayaka14732

[Bug] LayerNorm should not contain learnable parameters

2

There are two kinds of implementation of `LayerNorm`. (See PyTorch documentation: ) (1) Without learnable per-element affine parameters: ![](https://user-images.githubusercontent.com/68557794/150056919-1c3e2cf4-17b6-4c18-a0c4-f361f782d42a.png) (2) With learnable per-element affine parameters: ![](https://user-images.githubusercontent.com/68557794/150056933-86832852-ec98-4d29-928e-1b80c729fa54.png) According to the original...

ayaka14732

Questions regarding the implementation

10

1. In this file https://github.com/hyunwoongko/transformer/blob/master/models/model/transformer.py, you define the functions `make_pad_mask` and `make_no_peak_mask`, but it is actually used during training? 2. In this file, https://github.com/hyunwoongko/transformer/blob/master/models/layers/position_wise_feed_forward.py why does your `PositionwiseFeedForward` have extra...

yuvaraj91

How to convert TorchText 0.9 to the latest version

10

I'm new to transformer recently ，and I know there is official documentation, but it doesn't fix the problem. Can someone help me change torchtext 0.9 to the new version？

zliguo

batch.trg[j] out of index.

1

in train.py the size of batch.trg is [118, 35]. the for loop will definitely lead to out of bounds. ``` total_bleu = [] for j in range(batch_size): try: trg_words =...

Bryce1998

how to get dataset

5

I'm new to transformer recently and don't know how to get the dataset in this project. Please help me to provide a linux script if you can.

Sun-Happy-YKX

shaollow copy

1

this is a shaollow copy, makes the "_x" and "x" totally the same one? https://github.com/hyunwoongko/transformer/blob/0e5ce57589d7307cf76b53241cc523841ff67655/models/blocks/encoder_layer.py#L27

xxoospring

About multi-head attention in attention is all you need, thanks.

2

Hello, author. I am sincerely that you can answer me when you saw. I urgently want to realize why there are Q, K, V as input in multi-head attention and...

sonrisa07

how to resolve the issue ”No module named 'torch._C'“

` 》pip show torch Name: torch Version: 1.13.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3 Location:...

linlone

transformer
transformer copied to clipboard

Metadata

Reporting a bug during test time

[Bug] Dropout should comes before residual connection and layer norm

[Bug] LayerNorm should not contain learnable parameters

Questions regarding the implementation

How to convert TorchText 0.9 to the latest version

batch.trg[j] out of index.

how to get dataset

shaollow copy

About multi-head attention in attention is all you need, thanks.

how to resolve the issue ”No module named 'torch._C'“

← Metadata

Owner

Metadata

transformer transformer copied to clipboard

Metadata

← Metadata

Owner

Metadata

transformer
transformer copied to clipboard