keras-transformer Plans on implementing an external mask

Plans on implementing an external mask

Open gregkoytiger opened this issue 5 years ago • 1 comments

Great work on this code! One feature of the transformer models typically is to use a mask to handle variable length input sequences such as in https://github.com/Lsdefine/attention-is-all-you-need-keras/blob/042ce3846b80dcebb169c856f378bfe26a18c6e4/transformer.py#L89

Is there any plan to implement this functionality?

Feb 28 '19 17:02 gregkoytiger

Hi! Yes, that is a reasonable feature. However, it currently has a low priority, since I currently don't have much time and a similar result can be achieved by introducing a special "pad" word into the vocabulary (assuming you're using the transformer for an NLP problem), then replacing all "unused" elements of the sequence with it, and letting the network itself learn an embedding that "will never be focused upon".

I understand this is not exactly the same thing as masking and requires the introduction of this special word during the training. If masking is critical for your needs, feel free to make the necessary changes yourself and send a pull request (with an example of how they are supposed to be used). I'll review and merge the changes.

Mar 04 '19 10:03 kpot

keras-transformer keras-transformer copied to clipboard

Plans on implementing an external mask

keras-transformer
keras-transformer copied to clipboard