annotated_deep_learning_paper_implementations icon indicating copy to clipboard operation
annotated_deep_learning_paper_implementations copied to clipboard

MultiHeadAttention parameter setting

Open LXXiaogege opened this issue 1 year ago • 2 comments

Is the output linear layer parameter of the MultiHeadAttention class incorrectly set in mha.py file? in_features should be heads*d_k?

LXXiaogege avatar Apr 30 '23 09:04 LXXiaogege

The get_positional_encoding method of position encoder generates an error when d_model is set to odd

LXXiaogege avatar May 02 '23 14:05 LXXiaogege

Our implementation assumes that heads * d_k = d_model. Need to change that

vpj avatar Jun 30 '23 10:06 vpj