bert Model detail: Why does "attention_heads" exist in "modeling.py"?

It looks like that the length of "attention_heads" is always 1 in the function of "transformer_model". But the code in "modeling.py" has an "if-else" statement. Can we remove the "attention_heads" and the "if-else" statement?

The code is here: (the lines from 833 to 855 in modeling.py) And the code like this:

      with tf.variable_scope("attention"):
        attention_heads = []
        with tf.variable_scope("self"):
          attention_head = attention_layer(
              from_tensor=layer_input,
              to_tensor=layer_input,
              attention_mask=attention_mask,
              num_attention_heads=num_attention_heads,
              size_per_head=attention_head_size,
              attention_probs_dropout_prob=attention_probs_dropout_prob,
              initializer_range=initializer_range,
              do_return_2d_tensor=True,
              batch_size=batch_size,
              from_seq_length=seq_length,
              to_seq_length=seq_length)
          attention_heads.append(attention_head)

        attention_output = None
        if len(attention_heads) == 1:
          attention_output = attention_heads[0]
        else:
          # In the case where we have other sequences, we just concatenate
          # them to the self-attention head before the projection.
          attention_output = tf.concat(attention_heads, axis=-1)

Nov 27 '18 08:11 dvector89

yes, I have the same question also. The attention_heads is always 1.

Dec 12 '18 03:12 gumanchang

yes, I have the same question also. The attention_heads is always 1.

Feb 22 '19 08:02 liyunrui

That's weird. attention_heads is always 1

also, attention_heads should always be 1

if len(attention_heads) > 1, after concat the shape of attention_output will be different then this attention_output cannot pass the next attention block.

similar with residual ops layer_output + attention_output

Mar 15 '19 08:03 nlp4whp

@jacobdevlin-google, why do you maintain the attention_heads?

Mar 22 '21 08:03 gauss-clb

masbro dari mana sih asalnya google indonesia

Dikirim dari Redmi 4A saya Pada gaussclb @.***>, 22 Mar 2021 3.43 PM menulis:

@jacobhttps://github.com/jacob Devlin Why do you maintain the attention_heads?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803876266, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7DF5P3J4TTI2EBQKBDTE37KDANCNFSM4GGT6RZA.

Mar 22 '21 09:03 IntelOSt

I am having trouble with email hijacking and google apps and pc software hacking through server recovery

Dikirim dari Redmi 4A saya Pada DIAN JAKA WIDIAWAN @.***>, 22 Mar 2021 4.32 PM menulis:

masbro dari mana sih asalnya google indonesia

Dikirim dari Redmi 4A saya Pada gaussclb @.***>, 22 Mar 2021 3.43 PM menulis:

@jacobhttps://github.com/jacob Devlin Why do you maintain the attention_heads?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803876266, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7DF5P3J4TTI2EBQKBDTE37KDANCNFSM4GGT6RZA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803911745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7FBD6FIGVNDSOIB6ULTE4FBRANCNFSM4GGT6RZA.

Mar 22 '21 09:03 IntelOSt

License Where does google Indonesia come from? All the of my google device in 2000 Mr. Google Indonesia with knee and foreign copyright piracy that has not been paid

Dikirim dari Redmi 4A saya Pada DIAN JAKA WIDIAWAN @.***>, 22 Mar 2021 4.35 PM menulis:

I am having trouble with email hijacking and google apps and pc software hacking through server recovery

Dikirim dari Redmi 4A saya Pada DIAN JAKA WIDIAWAN @.***>, 22 Mar 2021 4.32 PM menulis:

masbro dari mana sih asalnya google indonesia

Dikirim dari Redmi 4A saya Pada gaussclb @.***>, 22 Mar 2021 3.43 PM menulis:

@jacobhttps://github.com/jacob Devlin Why do you maintain the attention_heads?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803876266, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7DF5P3J4TTI2EBQKBDTE37KDANCNFSM4GGT6RZA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803911745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7FBD6FIGVNDSOIB6ULTE4FBRANCNFSM4GGT6RZA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803914452, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7CNN4WGQYLEIQCBKUDTE4FPNANCNFSM4GGT6RZA.

Mar 22 '21 09:03 IntelOSt

I think its purpose is to combine the output results of all previous layers to perform a linear transformation and feed to the next layer, not just the output of the previous layer.

Jul 30 '21 08:07 moon-hotel

bert bert copied to clipboard

Model detail: Why does "attention_heads" exist in "modeling.py"?

bert
bert copied to clipboard