Tree-Transformer
Tree-Transformer copied to clipboard
Questions in paper
I found that the value of attention to oneself in the attention figure is 0, that is, the value of the diagonal part is 0, which means that one cannot attend to oneself. Will this affect performance? Is there any experimental verification? Thanks.