transformer icon indicating copy to clipboard operation
transformer copied to clipboard

Transformer: PyTorch Implementation of "Attention Is All You Need"

Results 15 transformer issues
Sort by recently updated
recently updated
newest added

@hyunwoongko thanks for your nice implementation. By the way, I wanna point out an issue. If you notice, while testing, you are utilising the following code: ```python def test_model(num_examples): iterator...

In section 5.4 of the [original paper](https://arxiv.org/pdf/1706.03762.pdf): > We apply dropout to the output of each sub-layer, before it is added to the sub-layer input and normalized.

There are two kinds of implementation of `LayerNorm`. (See PyTorch documentation: ) (1) Without learnable per-element affine parameters: ![](https://user-images.githubusercontent.com/68557794/150056919-1c3e2cf4-17b6-4c18-a0c4-f361f782d42a.png) (2) With learnable per-element affine parameters: ![](https://user-images.githubusercontent.com/68557794/150056933-86832852-ec98-4d29-928e-1b80c729fa54.png) According to the original...

1. In this file https://github.com/hyunwoongko/transformer/blob/master/models/model/transformer.py, you define the functions `make_pad_mask` and `make_no_peak_mask`, but it is actually used during training? 2. In this file, https://github.com/hyunwoongko/transformer/blob/master/models/layers/position_wise_feed_forward.py why does your `PositionwiseFeedForward` have extra...

I'm new to transformer recently ,and I know there is official documentation, but it doesn't fix the problem. Can someone help me change torchtext 0.9 to the new version?

in train.py the size of batch.trg is [118, 35]. the for loop will definitely lead to out of bounds. ``` total_bleu = [] for j in range(batch_size): try: trg_words =...

I'm new to transformer recently and don't know how to get the dataset in this project. Please help me to provide a linux script if you can.

this is a shaollow copy, makes the "_x" and "x" totally the same one? https://github.com/hyunwoongko/transformer/blob/0e5ce57589d7307cf76b53241cc523841ff67655/models/blocks/encoder_layer.py#L27

Hello, author. I am sincerely that you can answer me when you saw. I urgently want to realize why there are Q, K, V as input in multi-head attention and...

` 》pip show torch Name: torch Version: 1.13.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: [email protected] License: BSD-3 Location:...