PyTorch-Batch-Attention-Seq2seq
PyTorch-Batch-Attention-Seq2seq copied to clipboard
masking for attention coefficients
I cannot understand why there is no masking operation for computing attention coefficients.
same question, too