Transformers in Pytorch
what are provided in this repo?
-
Model: Several transformer-based milestone models are reimplemented from scratch via pytorch
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/ViT.png)
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/vanilla_transformer.png)
-
Experiments: Conduct experiments on CV/NLP benchmark, respectively
-
Image Classification :
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/imagenet1k_dataset.png)
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/cifar10_dataset.png)
-
Neural Machine Translation :
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/multi30k_dataset.png)
-
Pipeline: End-to-end pipeline
-
Conveniently Playing : integrate data processing and model training/validation into one-stop shop pipeline
-
Efficiently Training : accelerate training and evaluating via DistributeDataParallel(DDP) and Mixed Precision(fp16)
-
Neatly Reading : neat file structure, easy for reading but non-trivial
- ./script → run train/eval
- ./model → model implementation
- ./data → data processing
Usage
1. Env Requirements
# Conda Env
python 3.6.10
torch 1.4.0+cu100
torchvision 0.5.0+cu100
torchtext 0.5.0
spacy 3.4.1
tqdm 4.63.0
# Apex (For mix precision training)
## run `gcc --version`
gcc (GCC) 5.4.0
## apex installation
git clone https://github.com/NVIDIA/apex
cd apex
git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0
rm -rf ./build
python setup.py install --cuda_ext --cpp_ext
# System Env
## run `nvcc --version`
Cuda compilation tools, release 10.0, V10.0.130
# run `nvidia-smi`
Check your own gpu device status
2. Data Requirements
-
multi30k, cifar10 could be automatically downloaded in pipeline
-
imagenet1k(ILSVRC2012) need manual download (Guide for download imagenet1k)
- Wait until three files download.
- ILSVRC2012_devkit_t12.tar.gz (2.5M)
- ILSVRC2012_img_train.tar (138G)
- ILSVRC2012_img_val.tar (6.3G)
- Run imagenet1k pipeline, ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar will be automatically unzipped and arranged in two directories 'data/ILSVRC2012/train' and 'data/ILSVRC2012/val'.
- But the unzip process costs more than a few hours or you can do it faster by shell anyway.
# Guide for download imagenet1k
mkdir -p data/ILSVRC2012
cd data/ILSVRC2012
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz --no-check-certificate
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar --no-check-certificate
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar --no-check-certificate
3. Run Experiments
3.1 Fine-tuning ViT on imagenet1k, cifar10
- Download pretrained ViT_B_16 model parameters from official storage.
cd data
curl -o ViT_B_16.npz https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_224.npz
curl -o ViT_B_16_384.npz https://storage.googleapis.com/vit_models/augreg/B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0--imagenet2012-steps_20k-lr_0.01-res_384.npz
- Before run experiments
- Set CUDA env in script/run_img_cls_task.py/__main__ according to your GPU device
- Adjust train/eval settings in script/run_img_cls_task.py/get_args() and launch the experiment
cd script
# run experiments on cifar10
# (4mins/epoch, 3.5hours totally | GPU device: P40×4)
python ./run_image_cls_task.py cifar10
# run experiments on imagenet1k
# (less than 5hours/epcoch, more than 10hours totally | GPU device: P40×4 )
python ./run_image_cls_task.py ILSVRC2012
# Tips:
# 1. Both DDP and FP16 Mixed Precision Training are adopted for accelerating
# 2. The ratio of acceleration depends on your specific GPU device
3.2 Train vanilla transformer from scratch on multi30k (en → de)
- Before run experiments
- Set CUDA env in script/run_nmt_task.py/__main__ according to your GPU device
- Adjust train/eval settings in script/run_nmt_task.py/get_args() and launch the experiment
# run experiments on multi30k (small dataset ,3mins total | GPU device : P40×4 | U can also fork and adjust the pipeline and run this experiments in a small capacity gpu device)
cd script
python ./run_nmt_task.py multi30k
# Tips:
# 1. DDP is adopted for accelerating
# 1. For inference, "greedy search" and "beam search" are also included in the nmt task pipeline.
4. Results
4.1 Fine-tuning ViT on imagenet1k, cifar10,
- This repo
-
Imagenet1k : ACC 84.9% (result on 50,000 val set for | resolution 384 | extra label smoothing confidence 0.9 | batch size 160, nearly 15,000 training steps)
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/imagenet1k_result.png)
-
Cifar10 : ACC 99.04% (resolution 224 | batch size 640, nearly 5500 training steps)
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/cifar10_result.png)
- Comparison to official result ViT Implementation by Google
![](https://github.com/TigerJeffX/Transformers-in-pytorch/raw/main/readme_supply/both_benchmark.png)
4.2 Train vanilla transformer from scratch on multi30k (en → de)
Reference materials for further study
- Transformer Survey
- Vanilla Transformer Component Structures
- self-attention
- multi-head
- feed forward network
- residual connection & layer norm
- label smoothing related
- Recent Transformer Milestone Work in CV