Bi-Directional Block Self-Attention for fast and memory-efficient sequence modeling

Open flrngel opened this issue 6 years ago • 0 comments

Paper at ICLR2018 https://openreview.net/forum?id=H1cWzoxA- aka BiBloSAN (Bi-BloSAN)

Abstract

RNN has problem for parallel processing
CNN number growth by position distance
- Convolution Seq2Seq: Linearly
- ByteNet: Logarithmically
Bi-BloSAN succeeds,
- Transformer
- DiSAN (Directional Self-Attention Network)
  - forward/backward mask
  - using feature level
SAN(Self-Attention Network) has memory problem and Bi-BloSAN solves it
Core idea of Bi-BloSAN is processing tokens after splitting sequence to same length
Bi-BloSAN benefits: memory-efficient, fast (intra -> inter block makes this possible)

Additive attention is better than Multiplicative(dot-product) attention for
- memory, time, computing performance

Feb 06 '18 09:02 flrngel