fairseq-tutorial
fairseq-tutorial copied to clipboard
Fairseq tutorial
This is a tutorial document of pytorch/fairseq <https://github.com/pytorch/fairseq>
_.
.. contents:: Table of Contents
- Preface ==========
-
The current stable version of Fairseq is v0.x, but v1.x will be released soon. The specification changes significantly between v0.x and v1.x. This document is based on v1.x, assuming that you are just starting your research.
================== ================================== ======================= \ v0.x v1.x ================== ================================== ======================= Configuration
args: ArgumentParser.Namespace
cfg: OmegaConf
Add optionsadd_args(self, args)
@dataclass
Training commandfairseq-train```
fairseq-hydra-train`` ================== ================================== ======================= -
This document assumes that you understand virtual environments (e.g., pipenv, poetry, venv, etc.) and CUDA_VISIBLE_DEVICES.
- Installation ===============
I recommend to install from the source in a virtual environment.
.. code:: bash
git clone https://github.com/pytorch/fairseq cd fairseq pip install --editable ./
If you want faster training, install NVIDIA’s apex library.
.. code:: bash
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext"
--global-option="--deprecated_fused_adam" --global-option="--xentropy"
--global-option="--fast_multihead_attn" ./
- Training a Transformer NMT model ===================================
- https://github.com/de9uch1/fairseq-tutorial/tree/master/examples/translation
- Code walk ============
Commands
-
fairseq-preprocess
: Build vocabularies and binarize training data. -
fairseq-train
: Train a new model -
fairseq-hydra-train
: Train a new model w/ hydra -
fairseq-generate
: Generate sequences (e.g., translation, summary, POS tag etc.) -
fairseq-interactive
: Generate from raw text with a trained model -
fairseq-validate
: Validate a model (compute validation loss) -
fairseq-eval-lm
: Evaluate the perplexity of a trained language model -
fairseq-score
: Compute BLEU- I recommend to use
sacreBLEU
instead offairseq-score
.
- I recommend to use
Tools
Here are some of the most commonly used ones
-
scripts/average_checkpoints.py
: Loads checkpoints and returns a model with averaged weights. -
scripts/rm_pt.py
: Remove unnecessary checkpoints like each epoch checkpoints.
Examples: examples/
-
Translation
-
back translation
-
noisy channel
- alignment
-
constrained decoding
-
simultaneous translation
-
MoE
-
WMT19 winner system
-
multilingual translation
-
scaling NMT
-
-
Paraphraser
-
Language model
-
Summarization
- BART
- Pointer generator
-
Unsupervised quality estimation
-
LASER, XLM, Linformer
-
Speech-to-Text
-
wav2vec
-
Story generation
etc.
Components: fairseq/*
-
criterions/
: Compute the loss for the given sample. -
data/
: Dictionary, dataset, word/sub-word tokenizer -
dataclass/
: Common options -
distributed/
: Library for distributed and/or multi-GPU training -
logging/
: Logging, progress bar, Tensorboard, WandB -
modules/
: NN layer, sub-network, activation function, quantization -
models/
: NN model-
BERT, RoBERTa, BART, XLM-R, huggingface model
-
Non-autoregressive Transformer
- NAT
- Insertion Transformer
- CMLM
- Levenshtein Transformer
- CRF NAT
-
Speech-to-Text Transformer
-
wav2vec
-
LSTM + Attention (Luong et al., 2015)
-
Fully convolutional model (Gehring et al., 2017)
-
Transformer (Vaswani et al., 2017)
- Alignment (Garg et al., 2019)
- Multilingual
-
-
optim/
: Optimizers, FP16- Adadelta
- Adafactor
- Adagrad
- Adam
- SGD
etc.
-
optim/lr_scheduler/
: Learning rate scheduler- Cosine
- Fixed
- Inverse square root (Vaswani et al., 2017)
- Polynomial decay
- Triangular
etc.
-
tasks/
- Audio pretraining / fine-tuning
- Denoising
- Language modeling
- Masked LM, cross lingual LM
- Reranking
- Translation
etc.
-
registry.py
: criterion, model, task, optimizer manager -
search.py
- Beam search
- Lexically constrained beam search
- Length constrained beam search
- Sampling
-
sequence_generator.py
: Generate sequences of a given sentence. -
sequence_scorer.py
: Score the sequence for a given sentence. -
trainer.py
: Library for training a network
Training flow of translation
main: fairseq_cli/train.py
-
fairseq_cli/hydra_train.py
sets options and after callsfairseq_cli/train.py
.
-
Parse options defined by
dataclass <https://docs.python.org/3/library/dataclasses.html>
__-
fairseq.tasks.translation.TranslationConfig
-
fairseq.models.transformer.transformer_config.TransformerConfig
-
fairseq.criterions.label_smoothed_cross_entropy.LabelSmoothedCrossEntropyConfig
-
fairseq.optim.adam.FairseqAdamConfig
-
fairseq.dataclass.configs.FairseqConfig
Options are stored to
OmegaConf <https://github.com/omry/omegaconf>
_, so it can be accessed via attribute style (cfg.foobar
) and dictionary style (cfg["foobar"]
)... note:: In v0.x, options are defined by
ArgumentParser
. -
-
Setup task
-
fairseq.tasks.translation.Translation.setup_task()
: class method- Load dictionary
- Build and return
self
(TranslationTask
).
-
-
Build model and criterion
-
fairseq.tasks.translation.Translation.build_model()
→fairseq.models.transformer.transformer_legacy.TransformerModel.build_model()
: class methodThis method is used to maintain compatibility for v0.x.
→
fairseq.models.transformer.transformer_base.TransformerModelBase.build_model()
: class methodBuild embedding, encoder, and decoder
-
fairseq.criterions.label_smoothed_cross_entropy.LabelSmoothedCrossEntropy
-
-
Build trainer
-
fairseq.trainer.Trainer
- Load training set and make data iterator
- Build optimizer and learning rate scheduler
-
-
Start training loop
-
Call
fairseq.trainer.Trainer.train_step()
-
Reset gradients
-
Set the model to train mode
-
Call
task.train_step()
- Compute the loss of given sentences by
criterion(model, sample)
. - Compute the gradients
- Compute the loss of given sentences by
-
Loop i. — iii. until
cfg.optimizer.update_freq
to accumulate the gradients -
Reduce gradients across workers (for multi-node/multi-GPU)
-
Clip gradients
-
Update model parameters by
task.optimizer_step()
-
Log statistics
-
-
Loop a. until the end of each epoch
-
Compute validate loss
-
Save the model checkpoint.
-
Generation flow of translation
main: fairseq_cli/generate.py
-
Parse options defined by
dataclass <https://docs.python.org/3/library/dataclasses.html>
__-
fairseq.tasks.translation.TranslationConfig
-
fairseq.models.transformer.transformer_config.TransformerConfig
-
fairseq.dataclass.configs.FairseqConfig
-
-
Setup task
-
fairseq.tasks.translation.Translation.setup_task()
: class method- Load dictionary
- Build and return
self
(TranslationTask
).
-
-
Load the model and dataset
-
checkpoint_utils.load_model_ensemble()
Build the model and load parameters.
-
task.load_dataset()
Load the dataset.
-
-
Build generator
-
task.build_generator() -> fairseq.sequence_generator.SequenceGenerator
-
-
Generation
-
Call
task.inference_step()
-
Call
SequenceGenerator.generate()
- Search with
fairseq.search.BeamSearch
- Search with
-
Output the results
-
-
Customize and extend fairseq ===============================
- https://github.com/de9uch1/dbsa