keras-io icon indicating copy to clipboard operation
keras-io copied to clipboard

why keras.layers.MultiHeadAttention q-dim don't / heads?

Open zhanlie2008 opened this issue 3 years ago • 1 comments
trafficstars

zhanlie2008 avatar Jun 03 '22 13:06 zhanlie2008

class TransformerBlock(layers.Layer): def init(self, embed_dim, num_heads, ff_dim, rate=0.1): super(TransformerBlock, self).init() self.att = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim) self.ffn = keras.Sequential( [layers.Dense(ff_dim, activation="relu"), layers.Dense(embed_dim), ] ) self.layernorm1 = layers.LayerNormalization(epsilon=1e-6) self.layernorm2 = layers.LayerNormalization(epsilon=1e-6) self.dropout1 = layers.Dropout(rate) self.dropout2 = layers.Dropout(rate)

def call(self, inputs, training):
    attn_output = self.att(inputs, inputs)
    attn_output = self.dropout1(attn_output, training=training)
    out1 = self.layernorm1(inputs + attn_output)
    ffn_output = self.ffn(out1)
    ffn_output = self.dropout2(ffn_output, training=training)
    return self.layernorm2(out1 + ffn_output)

if embed_dim =32 ,ff_dim = 32, num_heads =2,this layer has 10656 Params to learn,it means (3232+32)23+(3264+32)+(3232+32)2 + 3222 = 10656 but normal multi-attention Params is (3232/22+32/22)3+(3232+32)+(3232+32)2 + 322*2 = 6464 it's this layer has something wrong?

zhanlie2008 avatar Jun 03 '22 13:06 zhanlie2008

Hi @zhanlie2008, thanks for reaching out. The issue backlog is not the best place for support requests. Can you repost this to one of the forums below? That way your question will have more visibility.

Thanks!

pcoet avatar Aug 23 '23 11:08 pcoet