castor
castor copied to clipboard
Hyper-parameter tuning for VDPWI
According to @daemon - the VDPWI works https://github.com/castorini/Castor/tree/master/vdpwi
But the effectiveness is still below STOA because the hyper-parameters haven't been tuned yet.
The old implementation was about 0.5 points off for Pearson's r on the test set -- now it's closer to 2. The biggest changes from the old impl to now would be using torchtext and PyTorch 0.4. The model code itself hasn't changed.
I ran 216 tests on parameters including: decay=[0.99, 0.95], lr=[5E-4, 1E-4], batch_size=[8, 16], momentum=[0, 0.15, 0.05], rnn_hidden_dim=[128, 256, 512], epochs=[10,15,20]. In all cases, I use RMSProp for optimization according to the paper.
The best result on test set for Pearson's r is 0.8707, which is 0.0077 lower than the original result. It's achieved under this param setting: --decay 0.95 --lr 0.0005 --optimizer rmsprop --momentum 0 --epochs 15 --batch-size 8 --rnn-hidden-dim 256.
Among all the "nearly best" results(like 0.8678, 0.8667), they share some same params: --lr 5e-4, --batch-size 8, --epochs 15.
Besides, I also ran some tests with SGD and Adam. Their performance are 1~2 points lower than RMSProp.
Good results!
So it is very close to the original paper, right? It means VDPWI-pytorch works!
Btw, according to my experience, SGD + good lr
is usually the best setup.
Could you send a PR to update the readme after you finish the tunning? @likicode
I've updated the readme and sent PR. @Victor0118
I re-ran the best parameter setting (Pearson's r 0.8707) 80 times with different random seeds. The 95% confidence interval is: [0.8625, 0.8644]. Among these 80 results, the highest Pearson'r value is 0.8710 by setting the random seed as 723.
The parameter setting is: --classifier vdpwi --lr 0.0005 --optimizer rmsprop --epochs 15 --momentum 0 --batch-size 8 --rnn-hidden-dim 256
Pearson's r | Spearman's p | MSE | |
---|---|---|---|
Original paper | 0.8784 | 0.8199 | 0.2329 |
Our result | 0.8710 | 0.8092 | 0.2501 |
I also run other parameters 10 times with different random seeds in case missing some potential good settings. There are two parameter sets achieving r value higher than 0.87: a) 0.8705 with 95% confidence interval [0.8621, 0.8667]; b)0.8702 with 95% confidence interval [0.8588, 0.8682].
@lintool To sum up, our best result improves 2 points after parameter tuning, and also very close to the result of the original torch implementation.