Convolution Sequence to Sequence Learning

Open flrngel opened this issue 6 years ago • 0 comments

aka Fairseq

https://arxiv.org/pdf/1705.03122.pdf

3. A Convolutional Architecture

P for position vector

e for embedding

(image from https://norman3.github.io/papers/docs/fairseq.html)

For image above, kernel width is 3, and convolution block stack size is 1

using residual connection from g_i

attention for dot product z and d_i

This is good for stabilize learning

Jan 29 '18 02:01 flrngel