Excalibur
Excalibur
Hello~ Sorry for my late reply, I just noticed these new issues. The document pair dataset is generated according to a clustering dataset where each document contains an event id...
Thanks!
I have uploaded a data sample here: Link: https://pan.baidu.com/s/1EAMeoxGq3h2f3vyV_x5vVg code: mvtx
Hi, we let Bert encode the first N (N = maximum length bert can process) words of each document to get the encoding of each doc from the [CLS] position....
@JerryZeyu I also have this phenomenon. I think maybe something in optimizer is not totally saved. However, I haven't figured out the reason.
@ygjia Well done! Thanks!
@hengruo May I ask that what is the current best performance you can get? I found a few things different from the paper: 1. your learning rate is not fixed...
@InitialBug May I ask what is the current best performance you can get? 1. In the QANet paper, they use warmup and fix lr=0.001 after 1000 steps. I revised it...
Another problem is that seems we don't have exponential moving average here.
@InitialBug I uploaded my implementation based on this repository to: https://github.com/BangLiu/QANet-PyTorch You are welcomed to test my code. I get memory explode using this implementation using batch_size 32, but my...