attention-learn-to-route
attention-learn-to-route copied to clipboard
Masking in SHA
Hi,
Would it be possible to apply masking only in the decoder single head attention? I think we have masking in both MHA and SHA in the decoder.
Best, Shaghayegh
Hi @shagharabbani, I think this would definitely be possible but is currently not implemented, also I'm not completely sure why you'd want that but feel free to try it!