understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

Language Modeling with Gated Convolutional Networks

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1612.08083

Abstract

  • propose gating mechanism
  • uses WikiText-103 and Google Billion Words
  • proposed model is very competitive to strong recurrent models on large scale language tasks

1. Introduction

  • convolutional network has parallelization benefit
    • but cuDNN is not optimized for 1d convolutions yet
  • Gated linear solves vanishing gradient
  • Compare to PixcelCNN, Oord et al. 2016, GLU is better than LSTM-style gating (GTU)

2. Approach

  • convolutional has no temporal dependency compare to recurrent models
  • recurrent models have infinite contexts but paper's experiment shows it is not necessary
  • GLU image
  • abstract model image
  • model uses adaptive softmax which assign higher capacity to very frequent words and lower capacity to rare words
    • this results faster coputation and needs lower memory

3. Gating Mechanisms

  • purpose of gating mechanism is to control what information should be propagated through the hierarchy of layers
  • comparing to GTU (LSTM-style gating mechanism), gradient of GTU gradually vanishes because of downscaling factors tanh'(X) and \sigma'(X) but GLU doesn't have downscaling factor
    • this can be thought of as a multiplicative skip connection (which helps gradients flow through the layers)

4. Experimental Setup

4.2. Training

  • uses gradient clipping on training and it works well

4.3. Hyper-parameters

  • initialize layers with Kaiming initialization

5. Results

5.3. Non-linear Modeling

TODO

  • read http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf

flrngel avatar Jun 10 '18 14:06 flrngel