hello world! comments

Repositories
Issues
Comments

Results 4 comments of


                                            hello world!

Linear transform with bias at multi-head attention

> In the paper, [Attention is All You Need](https://arxiv.org/pdf/1706.03762.pdf), query, key, value are linear transformed without bias at the multi-head attention. > However, the variables in your code are transformed...

Report the results

Hi, I use all default parameters and get results worse than that you presented. EM: 67.975, F1: 78.015 with following parameter. --hidden=96 --num_heads=1 --num_steps=35000 I don't know why it happens?

Report the results

I try to modify the character embedding size to train again with following parameter --hidden=96 --num_heads=1 --num_steps=35000 --char_emb_size=200(original paper using) And I get the results: EM: 69.196, F1:78.66.

Report the results

@localminimum Thank you for your answer! I just want to compare with the result of the first row in listed results. I get the result (EM: 67.975, F1: 78.015) worse...