keras-transformer
keras-transformer copied to clipboard
Return correct output shape for MultiHeadAttention
In contrast to MultiHeadSelfAttention
, MultiHeadAttention
has two inputs but only one input. The current implementation does not override compute_output_shape
, which by default returns the input shapes unmodified. Instead, only the input shape of the decoder must be returned.
Otherwise, this results in errors during model construction if the sequence length of the encoder and decoder differ.