attention-learn-to-route Masking in SHA

Masking in SHA

Open shagharabbani opened this issue 2 years ago • 1 comments

Hi,

Would it be possible to apply masking only in the decoder single head attention? I think we have masking in both MHA and SHA in the decoder.

Best, Shaghayegh

Feb 23 '23 18:02 shagharabbani

Hi @shagharabbani, I think this would definitely be possible but is currently not implemented, also I'm not completely sure why you'd want that but feel free to try it!

May 30 '23 13:05 wouterkool

attention-learn-to-route attention-learn-to-route copied to clipboard

Masking in SHA

attention-learn-to-route
attention-learn-to-route copied to clipboard