x-transformers
x-transformers copied to clipboard
Question: decoder attention mask?
I am trying to use Xtransformer for language translation. In the original transformer paper, target input to decoder is masked such that attentions are only to current and past tokens, not future tokens. I didn't find a way to pass such a mask. Please advice.
If you use the Decoder module, the causal mask is automatically added