understanding-ai
understanding-ai copied to clipboard
An Empirical Evaluation of generic Convolutional and Recurrent Networks for Sequence Modeling
https://arxiv.org/abs/1803.01271 this paper introduces Temporal Convolutional Networks (aka TCN)
Summary
Showing empirical general convolutional Model(Temporal Convolutional Networks; TCN) are better than RNNs in several tasks.
Abstract
- Convolutional networks should be regarded as a natural starting point for sequence modeling tasks
1. Introduction
- As starting sequence modeling, Recurrent models are first approach
- But there's some research about convolutional models can reach state-of-art
- This paper shows TCN architecture that is applied across all tasks
-
TCN is
- simple and clearer than canonical recurrent networks
- combines modern convolutional architectures
- outperforms baseline recurrent architectures
- retains longer memory and longer history
3. Temporal Convolutional Networks
- Paper aims simple and powerful architecture
- Characteristics of TCNs are
- there is no information leakage from future to past
- architecture can take sequence of any length and output sequence of same length
- uses residual layers and dilated convolutions
3.1. Sequence Modeling
- The goal of learning in sequence modeling setting is to find a network f that minimizes some expected loss between the actual outputs and the predictions.
3.2. Causal Convolutions
-
TCN's principle
- input and output has same lengths
- no leakage from the future into past
- To achieve 2. above, TCN uses causal convolutions
- that convolutions where an output at time t is convolved only with elements from t and ealier in the previous layer
- this seems like masked convolution (van den Oord et al., 2016)
- TCN = 1D FCN + causal convolutions
- that convolutions where an output at time t is convolved only with elements from t and ealier in the previous layer
3.3. Dilated Convolutions
- Simple causal convolution has retention memory problem
- Paper employs diilated convolutions (van den Oord et al., 2016) to enable an exponentially large receptive field (Yu & Koltun, 2106)
- Dilated factors are exponential (d=1, d=2, d=4 ...)
3.4. Residual Connections
- see Figure 1 (b) and (c)
3.5. Discussion
Advantage
- Parallelism
- Flexible receptive field size
- Stable Gradients comparing to RNN
-
TCN avoids exploding/vanishing gradients
- because TCN has a backpropagation path different from the temporal direction of the sequence.
-
TCN avoids exploding/vanishing gradients
- Low memory requirement for training
- Variable length inputs
Disadvantage
- Data storage during evaluation
- RNN can use less memory on evaluation compare to training process
- Potential parameter change for a transfer of domain
-
TCN may not perform well on transfer of domain
- when little memory need -> large memory need
- for not having a sufficiently large receptive field
-
TCN may not perform well on transfer of domain
5. Experiments
References
- https://medium.com/mlreview/a-guide-to-receptive-field-arithmetic-for-convolutional-neural-networks-e0f514068807
- (Korean) http://iamyoonkim.tistory.com/18
- (Korean) http://iamyoonkim.tistory.com/16