mesh
mesh copied to clipboard
bias in selfAttention
when running transformer, bias is not existed in selfAttention. mesh_tensorflow/bert has bias in selfAttention. what's the meaning of relative_attention_type transformer_layer.SelfAttention? how could I get the bias in transformer_layer.SelfAttention?