annotated-transformer
annotated-transformer copied to clipboard
MultiHeadedAttention: affine transforms
First of all: thank you for this work, it is really easy to follow along this notebook.
My question is the following: In the MultiHeadedAttention class, you instantiate 4 affine layers instead of 4 linear ones (bias is True by default). Is this on purpose? Then the text should be updated, as there are only the 4 matrices mentioned.