Transformers in Pytorch

what are provided in this repo?

Model: Several transformer-based milestone models are reimplemented from scratch via pytorch
- Vision Transformer (CV) : AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
- Vanilla Transformer (NLP) : Attention Is All You Need
Experiments: Conduct experiments on CV/NLP benchmark, respectively
- Image Classification :
  - ImageNet1k
  - Cifar10
- Neural Machine Translation :
  - Multi30k
Pipeline: End-to-end pipeline
- Conveniently Playing : integrate data processing and model training/validation into one-stop shop pipeline
- Efficiently Training : accelerate training and evaluating via DistributeDataParallel(DDP) and Mixed Precision(fp16)
- Neatly Reading : neat file structure, easy for reading but non-trivial
  - ./script → run train/eval
  - ./model → model implementation
  - ./data → data processing

Usage

1. Env Requirements

# Conda Env
python 3.6.10
torch 1.4.0+cu100
torchvision 0.5.0+cu100
torchtext 0.5.0
spacy 3.4.1
tqdm 4.63.0

# Apex (For mix precision training)
## run `gcc --version` 
gcc (GCC) 5.4.0
## apex installation
git clone https://github.com/NVIDIA/apex
cd apex
git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0
rm -rf ./build
python setup.py install --cuda_ext --cpp_ext

# System Env 
## run `nvcc --version`
Cuda compilation tools, release 10.0, V10.0.130
# run `nvidia-smi`
Check your own gpu device status

2. Data Requirements

multi30k, cifar10 could be automatically downloaded in pipeline
imagenet1k(ILSVRC2012) need manual download (Guide for download imagenet1k)
- Wait until three files download.
  - ILSVRC2012_devkit_t12.tar.gz (2.5M)
  - ILSVRC2012_img_train.tar (138G)
  - ILSVRC2012_img_val.tar (6.3G)
- Run imagenet1k pipeline, ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar will be automatically unzipped and arranged in two directories 'data/ILSVRC2012/train' and 'data/ILSVRC2012/val'.
- But the unzip process costs more than a few hours or you can do it faster by shell anyway.

# Guide for download imagenet1k
mkdir -p data/ILSVRC2012
cd data/ILSVRC2012
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz --no-check-certificate
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar --no-check-certificate
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar --no-check-certificate

3. Run Experiments

3.1 Fine-tuning ViT on imagenet1k, cifar10

Download pretrained ViT_B_16 model parameters from official storage.

cd data
curl -o ViT_B_16.npz https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz
curl -o ViT_B_16_384.npz https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz

Before run experiments
- Set CUDA env in script/run_img_cls_task.py/__main__ according to your GPU device
- Adjust train/eval settings in script/run_img_cls_task.py/get_args() and launch the experiment

cd script

# run experiments on cifar10
# (4mins/epoch, 3.5hours totally | GPU device: P40×4)
python ./run_image_cls_task.py cifar10

# run experiments on imagenet1k
# (less than 5hours/epcoch, more than 10hours totally | GPU device: P40×4 )
python ./run_image_cls_task.py ILSVRC2012

# Tips:
# 1. Both DDP and FP16 Mixed Precision Training are adopted for accelerating
# 2. The ratio of acceleration depends on your specific GPU device

3.2 Train vanilla transformer from scratch on multi30k (en → de)

Before run experiments
- Set CUDA env in script/run_nmt_task.py/__main__ according to your GPU device
- Adjust train/eval settings in script/run_nmt_task.py/get_args() and launch the experiment

# run experiments on multi30k (small dataset ,3mins total | GPU device : P40×4 | U can also fork and adjust the pipeline and run this experiments in a small capacity gpu device)
cd script

python ./run_nmt_task.py multi30k

# Tips:
# 1. DDP is adopted for accelerating
# 1. For inference, "greedy search" and "beam search" are also included in the nmt task pipeline.

4. Results

4.1 Fine-tuning ViT on imagenet1k, cifar10,

This repo
- Imagenet1k : ACC 84.9% (result on 50,000 val set for | resolution 384 | extra label smoothing confidence 0.9 | batch size 160, nearly 15,000 training steps)
- Cifar10 : ACC 99.04% (resolution 224 | batch size 640, nearly 5500 training steps)
Comparison to official result ViT Implementation by Google

4.2 Train vanilla transformer from scratch on multi30k (en → de)

This repo
- Multi30k : BLEU 38.6 (en→de | nearly 17M #Params | batch size 512, nearly 1200 training steps)
Comparison to results in Dynamic Context-guided Capsule Network for Multimodal Machine Translation

Reference materials for further study

Transformer Survey
- Efficient Transformers: A Survey
- A Survey of Transformers
Vanilla Transformer Component Structures
- self-attention
  - ON THE RELATIONSHIP BETWEEN SELF-ATTENTION AND CONVOLUTIONAL LAYERS
  - Online and Linear-Time Attention by Enforcing Monotonic Alignments
- multi-head
  - Are Sixteen Heads Really Better than One?
  - Multi-head or Single-head? An Empirical Comparison for Transformer Training
- feed forward network
  - Transformer Feed-Forward Layers Are Key-Value Memories
- residual connection & layer norm
  - On Layer Normalization in the Transformer Architecture
  - PowerNorm: Rethinking Batch Normalization in Transformers
- label smoothing related
Recent Transformer Milestone Work in CV

Transformers-in-pytorch
Transformers-in-pytorch copied to clipboard

Metadata

Transformers in Pytorch

what are provided in this repo?

Usage

1. Env Requirements

2. Data Requirements

3. Run Experiments

3.1 Fine-tuning ViT on imagenet1k, cifar10

3.2 Train vanilla transformer from scratch on multi30k (en → de)

4. Results

4.1 Fine-tuning ViT on imagenet1k, cifar10,

4.2 Train vanilla transformer from scratch on multi30k (en → de)

Reference materials for further study

← Metadata

Owner

Metadata

Transformers-in-pytorch Transformers-in-pytorch copied to clipboard

Metadata

Transformers in Pytorch

what are provided in this repo?

Usage

1. Env Requirements

2. Data Requirements

3. Run Experiments

3.1 Fine-tuning ViT on imagenet1k, cifar10

3.2 Train vanilla transformer from scratch on multi30k (en → de)

4. Results

4.1 Fine-tuning ViT on imagenet1k, cifar10,

4.2 Train vanilla transformer from scratch on multi30k (en → de)

Reference materials for further study

← Metadata

Owner

Metadata

Transformers-in-pytorch
Transformers-in-pytorch copied to clipboard