understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

Non-Autoregressive Neural Machine Translation

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1711.02281

Abstract

Features

  • Non-Autoregressive (means output selves doesn't have dependency)
  • Parallel outputs

How

  • Knowledge distillation
  • Input token fertilities
  • Policy Gradient

1. Introduction

Paper model uses CNN and SAN (Transformer) to avoid autoregressive

2. Background

2.1. Autoregressive Neural Machine Translation

  • Transformer's masking is better than CNN

2.2. Non-Autoregressive decoding

Problems of beam-search

  • suffers from diminishing returns with respect to beam size
  • limits search parallelism

They made output length variable T as probabilistic variable

2.3. The multimodality problem

Multimodality problem is problem of "high multimodal distribution of target translation"

3. The non-autoregressive transformer

image

3.3. Modeling fertility to tackle the multimodality problem

Used IBM Model 2 to use fertilities.

Definition of fertilities and it's benefit

  • Definition: number of input word has been copied
  • Provides natural factorization that dramatically reduces mode space
  • Allows decoder easier

3.4. Translation predictor and the decoding process

  • Argmax decoding
  • Average decoding
  • Noisy parallel decoding

4. Training

~~I didn't like this section~~

image

4.2. Fine-Tuning

Uses KL Divergence, RL, backpropagation

Word-level knowledge distillation (Teacher) image

External fertility inference model image

Todo

  • (3.4) Search about average decoding and noisy parallel decoding

flrngel avatar Feb 23 '18 08:02 flrngel