understanding-ai
understanding-ai copied to clipboard

Published 20 hours ago •

Reame
Issues

Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Open flrngel opened this issue 6 years ago • 0 comments

https://arxiv.org/abs/1709.04696

this is primitive version of Bi-BloSAN(see more from #2)

1. Introduction

Multi Head Attention: Attention Layer that can have benefit of parallel computign
Paper points that old attention NLP tasks are designed for Seq2Seq
Positional encoding makes temporal order more important

Features of DiSAN

Context aware
Parallel computing
Directional Self-Attention
Few parameters than RNN, transformer, etc

w.r.t means with regard to

3. Background

Self-Attention: using same x as i and j

3.1 Multi-dimensional Attention

Multi dimension makes context aware

3.3 Directional Self-Attention

x->b
token2token
process partly
masking

-inf makes softmax zero

Feb 15 '18 11:02 flrngel