Andy
Andy
Sorry, but i did not implement the model with squad2.0 yet.
https://github.com/pytorch/pytorch/issues/2341 You could take a look at this thread. Set dataloader number of workers to 0 to see the actual bug.
I don't think it will make a big difference since -1e30 is already a extremely small value and should mask correctly.
What hidden size did you use? I have tried 96 and 128. 128 performs better. You can try tuning the hidden size.
The hyper parameters of the repository is mostly based on “NLPLearn/QANet”, so the results are similar. I have tried to reproduce the result of the paper. But with limited resources,...
I have implemented a repository [QANet](https://github.com/andy840314/QANet-pytorch-), mostly based on this repository and another Tensorflow implementation [Tensorflow QANet](https://github.com/NLPLearn/QANet). I can reach F1: 75.0 EM: 64.0 in 60000 steps. You could take...
@BangLiu i'm not sure, but i will try adding EMA first.