MoChA-pytorch
MoChA-pytorch copied to clipboard
PyTorch Implementation of "Monotonic Chunkwise Attention" (ICLR 2018)
PyTorch Implementation of Monotonic Chunkwise Attention
Requirements
- PyTorch 0.4
TODOs
- [x] Soft MoChA
- [x] Hard MoChA
- [ ] Linear Time Decoding
- [ ] Experiment with Real-world dataset
Model figure

Linear Time Decoding
It's not clear if authors' TF implementation supports decoding in linear time. They calculate energies for whole encoder outputs instead of scanning from previously attended encoder output.
References
- Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss and Douglas Eck. Online and Linear-Time Attention by Enforcing Monotonic Alignments (ICML 2017)
- Chung-Cheng Chiu and Colin Raffel. Monotonic Chunkwise Attention (ICLR 2018)