OpenNMT-py [WIP v1 - deprecated] entmax 1.5 for attention and outputs, faster implementation of sparsemax

[WIP v1 - deprecated] entmax 1.5 for attention and outputs, faster implementation of sparsemax

Open bpopeters opened this issue 4 years ago • 2 comments

This pull request adds support for entmax 1.5, a sparse alternative to softmax which we describe in our ACL paper, Sparse Sequence-to-Sequence Models. It uses the implementations of sparsemax and entmax-1.5 from entmax package, available from pip.

This pull request does not include support for entmax with other alpha values. I suspect the code for that will be a little bit more involved and I can get to it soon.

It also does not include support for entmax attention in transformers, but I can probably make that PR next week as well.

One potential issue is that our entmax code does not support python 2. I don't know who still needs python 2 support for OpenNMT.

Aug 23 '19 17:08 bpopeters

Hi Ben, welcome back. Up to now, we tried to make the code python2 compatible (which is a requirement in Travis as you can see). I do understand it is a bit obsolete (plus python3 is requirement for distributed training) but is there much to do to make it compatible?

Aug 24 '19 08:08 vince62s

It probably would not require very many changes, but it isn't really on our agenda since python 2 is only supported until the end of the year.

Aug 24 '19 09:08 bpopeters

OpenNMT-py OpenNMT-py copied to clipboard

[WIP v1 - deprecated] entmax 1.5 for attention and outputs, faster implementation of sparsemax

OpenNMT-py
OpenNMT-py copied to clipboard