bert
bert copied to clipboard
Model detail: Why does "attention_heads" exist in "modeling.py"?
It looks like that the length of "attention_heads" is always 1 in the function of "transformer_model". But the code in "modeling.py" has an "if-else" statement. Can we remove the "attention_heads" and the "if-else" statement?
The code is here: (the lines from 833 to 855 in modeling.py) And the code like this:
with tf.variable_scope("attention"):
attention_heads = []
with tf.variable_scope("self"):
attention_head = attention_layer(
from_tensor=layer_input,
to_tensor=layer_input,
attention_mask=attention_mask,
num_attention_heads=num_attention_heads,
size_per_head=attention_head_size,
attention_probs_dropout_prob=attention_probs_dropout_prob,
initializer_range=initializer_range,
do_return_2d_tensor=True,
batch_size=batch_size,
from_seq_length=seq_length,
to_seq_length=seq_length)
attention_heads.append(attention_head)
attention_output = None
if len(attention_heads) == 1:
attention_output = attention_heads[0]
else:
# In the case where we have other sequences, we just concatenate
# them to the self-attention head before the projection.
attention_output = tf.concat(attention_heads, axis=-1)
yes, I have the same question also. The attention_heads is always 1.
yes, I have the same question also. The attention_heads is always 1.
That's weird. attention_heads is always 1
also, attention_heads should always be 1
if len(attention_heads) > 1, after concat the shape of attention_output
will be different
then this attention_output
cannot pass the next attention block.
similar with residual ops layer_output + attention_output
@jacobdevlin-google, why do you maintain the attention_heads
?
masbro dari mana sih asalnya google indonesia
Dikirim dari Redmi 4A saya Pada gaussclb @.***>, 22 Mar 2021 3.43 PM menulis:
@jacobhttps://github.com/jacob Devlin Why do you maintain the attention_heads?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803876266, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7DF5P3J4TTI2EBQKBDTE37KDANCNFSM4GGT6RZA.
I am having trouble with email hijacking and google apps and pc software hacking through server recovery
Dikirim dari Redmi 4A saya Pada DIAN JAKA WIDIAWAN @.***>, 22 Mar 2021 4.32 PM menulis:
masbro dari mana sih asalnya google indonesia
Dikirim dari Redmi 4A saya Pada gaussclb @.***>, 22 Mar 2021 3.43 PM menulis:
@jacobhttps://github.com/jacob Devlin Why do you maintain the attention_heads?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803876266, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7DF5P3J4TTI2EBQKBDTE37KDANCNFSM4GGT6RZA.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803911745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7FBD6FIGVNDSOIB6ULTE4FBRANCNFSM4GGT6RZA.
License Where does google Indonesia come from? All the of my google device in 2000 Mr. Google Indonesia with knee and foreign copyright piracy that has not been paid
Dikirim dari Redmi 4A saya Pada DIAN JAKA WIDIAWAN @.***>, 22 Mar 2021 4.35 PM menulis:
I am having trouble with email hijacking and google apps and pc software hacking through server recovery
Dikirim dari Redmi 4A saya Pada DIAN JAKA WIDIAWAN @.***>, 22 Mar 2021 4.32 PM menulis:
masbro dari mana sih asalnya google indonesia
Dikirim dari Redmi 4A saya Pada gaussclb @.***>, 22 Mar 2021 3.43 PM menulis:
@jacobhttps://github.com/jacob Devlin Why do you maintain the attention_heads?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803876266, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7DF5P3J4TTI2EBQKBDTE37KDANCNFSM4GGT6RZA.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803911745, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7FBD6FIGVNDSOIB6ULTE4FBRANCNFSM4GGT6RZA.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/google-research/bert/issues/182#issuecomment-803914452, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFQHH7CNN4WGQYLEIQCBKUDTE4FPNANCNFSM4GGT6RZA.
I think its purpose is to combine the output results of all previous layers to perform a linear transformation and feed to the next layer, not just the output of the previous layer.