keras-transformer Return correct output shape for MultiHeadAttention

Return correct output shape for MultiHeadAttention

Open Callidior opened this issue 5 years ago • 0 comments

In contrast to MultiHeadSelfAttention, MultiHeadAttention has two inputs but only one input. The current implementation does not override compute_output_shape, which by default returns the input shapes unmodified. Instead, only the input shape of the decoder must be returned. Otherwise, this results in errors during model construction if the sequence length of the encoder and decoder differ.

Sep 06 '19 12:09 Callidior

keras-transformer keras-transformer copied to clipboard

Return correct output shape for MultiHeadAttention

keras-transformer
keras-transformer copied to clipboard