Jean Senellart
Jean Senellart
I had a problem when handling a grid with quite small height (was showing only ~10rows), and scrolling vertically very fast through the list (I am using a touchpad): in...
option to quantize weights in model into INT16 (short) - reduces t7 size by 2.
* implement label smoothing as defined in [Szegedy, 2015](https://arxiv.org/pdf/1512.00567.pdf) - uniform distribution
In [Attention Is All You Need](https://arxiv.org/pdf/1706.03762.pdf) paper, several concepts are introduced that can fit in our current attention module: * So-called "Scaled Dot-Product Attention" - improving `dot` model (option `-global_attention...
variational dropout as described in [Gal et al., 2016](https://arxiv.org/pdf/1512.05287.pdf) does not have the expected result for NMT. Adding a new mode `variational_non_recurrent` for further exploration.
add local attention models and monotonic attention
better memory optimization using the computation graph, implement first vertical memory sharing
experiment with coverage models: - [Temporal Attention Model for Neural Machine Translation](https://arxiv.org/pdf/1608.02927.pdf) - [Modeling Coverage for Neural Machine Translation](http://www.aclweb.org/anthology/P16-1008)