pytorch-transformer
pytorch-transformer copied to clipboard
Clarification regarding dropout in the multihead attention block
Hi @hkproj
Why do you add dropout to the attention scores (line 110 in model.py)? Shouldn't you discard the dropout in the multihead attention block because you already add a dropout (line 81) in the residual connection block?