keras-io
keras-io copied to clipboard
why keras.layers.MultiHeadAttention q-dim don't / heads?
class TransformerBlock(layers.Layer): def init(self, embed_dim, num_heads, ff_dim, rate=0.1): super(TransformerBlock, self).init() self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim) self.ffn = keras.Sequential( [layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim), ] ) self.layernorm1 = layers.LayerNormalization(epsilon=1e-6) self.layernorm2 = layers.LayerNormalization(epsilon=1e-6) self.dropout1 = layers.Dropout(rate) self.dropout2 = layers.Dropout(rate)
def call(self, inputs, training):
attn_output = self.att(inputs, inputs)
attn_output = self.dropout1(attn_output, training=training)
out1 = self.layernorm1(inputs + attn_output)
ffn_output = self.ffn(out1)
ffn_output = self.dropout2(ffn_output, training=training)
return self.layernorm2(out1 + ffn_output)
if embed_dim =32 ,ff_dim = 32, num_heads =2,this layer has 10656 Params to learn,it means (3232+32)23+(3264+32)+(3232+32)2 + 3222 = 10656 but normal multi-attention Params is (3232/22+32/22)3+(3232+32)+(3232+32)2 + 322*2 = 6464 it's this layer has something wrong?
Hi @zhanlie2008, thanks for reaching out. The issue backlog is not the best place for support requests. Can you repost this to one of the forums below? That way your question will have more visibility.
Thanks!