understanding-ai icon indicating copy to clipboard operation
understanding-ai copied to clipboard

An Empirical Evaluation of generic Convolutional and Recurrent Networks for Sequence Modeling

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1803.01271 this paper introduces Temporal Convolutional Networks (aka TCN)

Summary

Showing empirical general convolutional Model(Temporal Convolutional Networks; TCN) are better than RNNs in several tasks.

Abstract

  • Convolutional networks should be regarded as a natural starting point for sequence modeling tasks

1. Introduction

  • As starting sequence modeling, Recurrent models are first approach
  • But there's some research about convolutional models can reach state-of-art
  • This paper shows TCN architecture that is applied across all tasks
  • TCN is
    • simple and clearer than canonical recurrent networks
    • combines modern convolutional architectures
    • outperforms baseline recurrent architectures
    • retains longer memory and longer history

3. Temporal Convolutional Networks

image

  • Paper aims simple and powerful architecture
  • Characteristics of TCNs are
    • there is no information leakage from future to past
    • architecture can take sequence of any length and output sequence of same length
    • uses residual layers and dilated convolutions

3.1. Sequence Modeling

image

  • The goal of learning in sequence modeling setting is to find a network f that minimizes some expected loss between the actual outputs and the predictions.

3.2. Causal Convolutions

  • TCN's principle
    1. input and output has same lengths
    2. no leakage from the future into past
  • To achieve 2. above, TCN uses causal convolutions
    • that convolutions where an output at time t is convolved only with elements from t and ealier in the previous layer
      • this seems like masked convolution (van den Oord et al., 2016)
    • TCN = 1D FCN + causal convolutions

3.3. Dilated Convolutions

  • Simple causal convolution has retention memory problem
  • Paper employs diilated convolutions (van den Oord et al., 2016) to enable an exponentially large receptive field (Yu & Koltun, 2106) image
  • Dilated factors are exponential (d=1, d=2, d=4 ...)

3.4. Residual Connections

  • see Figure 1 (b) and (c)

3.5. Discussion

Advantage

  • Parallelism
  • Flexible receptive field size
  • Stable Gradients comparing to RNN
    • TCN avoids exploding/vanishing gradients
      • because TCN has a backpropagation path different from the temporal direction of the sequence.
  • Low memory requirement for training
  • Variable length inputs

Disadvantage

  • Data storage during evaluation
    • RNN can use less memory on evaluation compare to training process
  • Potential parameter change for a transfer of domain
    • TCN may not perform well on transfer of domain
      • when little memory need -> large memory need
      • for not having a sufficiently large receptive field

5. Experiments

image image

References

  • https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807
  • (Korean) http://iamyoonkim.tistory.com/18
  • (Korean) http://iamyoonkim.tistory.com/16

flrngel avatar Mar 10 '18 09:03 flrngel