Transformer-backbone
Transformer-backbone copied to clipboard
The reproduce of Transformer architecture in paper "Attention is all your need"
Transformer-backbone
This is the reproduce of Transformer architecture in paper "Attention is all your need".
The aim of this repository is to help those who want an insight to the details of Transformer realization, without being bothered with data preprocessing.
The structure of Transformer is illustrated as bellow

Thus, we build the network hierarchically. From the top to bottom level is
Transformer--Fused_Embedding Encoder Decoder--Encoder_layer Decoder_layer--Multiheaded Attention PositionWise_FeedForwardNetwork
the tree structure is shown as bellow:
-Transformer.py
--Fus_Embeddings(AggregationModel.py)
-- word Embedding Vectors
-- Positional Encoding(Modules.py)
--Encoder(AggregationModel.py)
-- Encoder Layer(Model.py)
-- MultiHeadedAttention(Modules.py)
-- PostionWiseFFN(Modules.py)
--Decoder(AggregationModel.py)
-- Decoder Layer(Model.py)
-- MultiHeadedAttention(Modules.py)
-- PostionWiseFFN(Modules.py)
Environment Configuration
- pytorch 1.1.0
- python 3.6.8
- torchtext 0.5.0
- tqdm
- dill
Usage
WMT'17 Multimodal Translation: de-en BPE
- The byte-pair-encoding has already been processed so that you can focus on the specific structure of Transformer
- Train the model
python train.py -data_pkl ./bpe_deen/bpe_vocab.pkl -train_path ./bpe_deen/deen-train -val_path ./bpe_deen/deen-val -log deen_bpe -label_smoothing -save_model trained -b 256 -warmup 128000 -epoch 400 - GPU requirement: 4 TitanX
Performance
Loss Accuracy


Acknowledgement
- The data interface is borrowed from "A PyTorch implementation of the Transformer model in "Attention is All You Need"."
- Another outstanding work "The Annotated Transformer" inspired me during my coding process