Non-Autoregressive Neural Machine Translation

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1711.02281

Abstract

Features

Non-Autoregressive (means output selves doesn't have dependency)
Parallel outputs

How

Knowledge distillation
Input token fertilities
Policy Gradient

1. Introduction

Paper model uses CNN and SAN (Transformer) to avoid autoregressive

2. Background

2.1. Autoregressive Neural Machine Translation

Transformer's masking is better than CNN

2.2. Non-Autoregressive decoding

Problems of beam-search

suffers from diminishing returns with respect to beam size
limits search parallelism

They made output length variable T as probabilistic variable

2.3. The multimodality problem

Multimodality problem is problem of "high multimodal distribution of target translation"

3. The non-autoregressive transformer

3.3. Modeling fertility to tackle the multimodality problem

Used IBM Model 2 to use fertilities.

Definition of fertilities and it's benefit

Definition: number of input word has been copied
Provides natural factorization that dramatically reduces mode space
Allows decoder easier

3.4. Translation predictor and the decoding process

Argmax decoding
Average decoding
Noisy parallel decoding

4. Training

~~I didn't like this section~~

4.2. Fine-Tuning

Uses KL Divergence, RL, backpropagation

Word-level knowledge distillation (Teacher)

External fertility inference model

Todo

(3.4) Search about average decoding and noisy parallel decoding

Feb 23 '18 08:02 flrngel

understanding-ai understanding-ai copied to clipboard

Non-Autoregressive Neural Machine Translation

Abstract

1. Introduction

2. Background

2.1. Autoregressive Neural Machine Translation

2.2. Non-Autoregressive decoding

2.3. The multimodality problem

3. The non-autoregressive transformer

3.3. Modeling fertility to tackle the multimodality problem

3.4. Translation predictor and the decoding process

4. Training

4.2. Fine-Tuning

Todo

understanding-ai
understanding-ai copied to clipboard