understanding-ai
understanding-ai copied to clipboard
Directional Self-Attention Network for RNN/CNN-Free Language Understanding
https://arxiv.org/abs/1709.04696
this is primitive version of Bi-BloSAN(see more from #2)
1. Introduction
- Multi Head Attention: Attention Layer that can have benefit of parallel computign
- Paper points that old attention NLP tasks are designed for Seq2Seq
- Positional encoding makes temporal order more important
Features of DiSAN
- Context aware
- Parallel computing
- Directional Self-Attention
- Few parameters than RNN, transformer, etc
w.r.t
means with regard to
3. Background
Self-Attention: using same x as i and j
3.1 Multi-dimensional Attention
- Multi dimension makes context aware
3.3 Directional Self-Attention
- x->b
- token2token
- process partly
- masking
-inf
makes softmax zero