hello world!
hello world!
> In the paper, [Attention is All You Need](https://arxiv.org/pdf/1706.03762.pdf), query, key, value are linear transformed without bias at the multi-head attention. > However, the variables in your code are transformed...
Hi, I use all default parameters and get results worse than that you presented. EM: 67.975, F1: 78.015 with following parameter. --hidden=96 --num_heads=1 --num_steps=35000 I don't know why it happens?
I try to modify the character embedding size to train again with following parameter --hidden=96 --num_heads=1 --num_steps=35000 --char_emb_size=200(original paper using) And I get the results: EM: 69.196, F1:78.66.
@localminimum Thank you for your answer! I just want to compare with the result of the first row in listed results. I get the result (EM: 67.975, F1: 78.015) worse...