OpenNMT-py
OpenNMT-py copied to clipboard
[WIP v1 - deprecated] entmax 1.5 for attention and outputs, faster implementation of sparsemax
This pull request adds support for entmax 1.5, a sparse alternative to softmax which we describe in our ACL paper, Sparse Sequence-to-Sequence Models. It uses the implementations of sparsemax and entmax-1.5 from entmax package, available from pip.
This pull request does not include support for entmax with other alpha values. I suspect the code for that will be a little bit more involved and I can get to it soon.
It also does not include support for entmax attention in transformers, but I can probably make that PR next week as well.
One potential issue is that our entmax code does not support python 2. I don't know who still needs python 2 support for OpenNMT.
Hi Ben, welcome back. Up to now, we tried to make the code python2 compatible (which is a requirement in Travis as you can see). I do understand it is a bit obsolete (plus python3 is requirement for distributed training) but is there much to do to make it compatible?
It probably would not require very many changes, but it isn't really on our agenda since python 2 is only supported until the end of the year.