understanding-ai
understanding-ai copied to clipboard

Published 20 hours ago •

Reame
Issues

An Empirical Evaluation of generic Convolutional and Recurrent Networks for Sequence Modeling

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1803.01271 this paper introduces Temporal Convolutional Networks (aka TCN)

Summary

Showing empirical general convolutional Model(Temporal Convolutional Networks; TCN) are better than RNNs in several tasks.

Abstract

Convolutional networks should be regarded as a natural starting point for sequence modeling tasks

1. Introduction

As starting sequence modeling, Recurrent models are first approach
But there's some research about convolutional models can reach state-of-art
This paper shows TCN architecture that is applied across all tasks
TCN is
- simple and clearer than canonical recurrent networks
- combines modern convolutional architectures
- outperforms baseline recurrent architectures
- retains longer memory and longer history

3. Temporal Convolutional Networks

Paper aims simple and powerful architecture
Characteristics of TCNs are
- there is no information leakage from future to past
- architecture can take sequence of any length and output sequence of same length
- uses residual layers and dilated convolutions

3.1. Sequence Modeling

The goal of learning in sequence modeling setting is to find a network f that minimizes some expected loss between the actual outputs and the predictions.

3.2. Causal Convolutions

TCN's principle
1. input and output has same lengths
2. no leakage from the future into past
To achieve 2. above, TCN uses causal convolutions
- that convolutions where an output at time t is convolved only with elements from t and ealier in the previous layer
  - this seems like masked convolution (van den Oord et al., 2016)
- TCN = 1D FCN + causal convolutions

3.3. Dilated Convolutions

Simple causal convolution has retention memory problem
Paper employs diilated convolutions (van den Oord et al., 2016) to enable an exponentially large receptive field (Yu & Koltun, 2106)
Dilated factors are exponential (d=1, d=2, d=4 ...)

3.4. Residual Connections

see Figure 1 (b) and (c)

3.5. Discussion

Advantage

Parallelism
Flexible receptive field size
Stable Gradients comparing to RNN
- TCN avoids exploding/vanishing gradients
  - because TCN has a backpropagation path different from the temporal direction of the sequence.
Low memory requirement for training
Variable length inputs

Disadvantage

Data storage during evaluation
- RNN can use less memory on evaluation compare to training process
Potential parameter change for a transfer of domain
- TCN may not perform well on transfer of domain
  - when little memory need -> large memory need
  - for not having a sufficiently large receptive field

5. Experiments

References

https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807
(Korean) http://iamyoonkim.tistory.com/18
(Korean) http://iamyoonkim.tistory.com/16

Mar 10 '18 09:03 flrngel