castor Hyper-parameter tuning for VDPWI

According to @daemon - the VDPWI works https://github.com/castorini/Castor/tree/master/vdpwi

But the effectiveness is still below STOA because the hyper-parameters haven't been tuned yet.

May 30 '18 15:05 lintool

The old implementation was about 0.5 points off for Pearson's r on the test set -- now it's closer to 2. The biggest changes from the old impl to now would be using torchtext and PyTorch 0.4. The model code itself hasn't changed.

May 30 '18 17:05 daemon

I ran 216 tests on parameters including: decay=[0.99, 0.95], lr=[5E-4, 1E-4], batch_size=[8, 16], momentum=[0, 0.15, 0.05], rnn_hidden_dim=[128, 256, 512], epochs=[10,15,20]. In all cases, I use RMSProp for optimization according to the paper.

The best result on test set for Pearson's r is 0.8707, which is 0.0077 lower than the original result. It's achieved under this param setting: --decay 0.95 --lr 0.0005 --optimizer rmsprop --momentum 0 --epochs 15 --batch-size 8 --rnn-hidden-dim 256.

Among all the "nearly best" results(like 0.8678, 0.8667), they share some same params: --lr 5e-4, --batch-size 8, --epochs 15.

Besides, I also ran some tests with SGD and Adam. Their performance are 1~2 points lower than RMSProp.

Jun 27 '18 08:06 likicode

Good results! So it is very close to the original paper, right? It means VDPWI-pytorch works! Btw, according to my experience, SGD + good lr is usually the best setup. Could you send a PR to update the readme after you finish the tunning? @likicode

Jun 27 '18 13:06 Victor0118

I've updated the readme and sent PR. @Victor0118

Jun 27 '18 14:06 likicode

I re-ran the best parameter setting (Pearson's r 0.8707) 80 times with different random seeds. The 95% confidence interval is: [0.8625, 0.8644]. Among these 80 results, the highest Pearson'r value is 0.8710 by setting the random seed as 723.

The parameter setting is: --classifier vdpwi --lr 0.0005 --optimizer rmsprop --epochs 15 --momentum 0 --batch-size 8 --rnn-hidden-dim 256

	Pearson's r	Spearman's p	MSE
Original paper	0.8784	0.8199	0.2329
Our result	0.8710	0.8092	0.2501

I also run other parameters 10 times with different random seeds in case missing some potential good settings. There are two parameter sets achieving r value higher than 0.87: a) 0.8705 with 95% confidence interval [0.8621, 0.8667]; b)0.8702 with 95% confidence interval [0.8588, 0.8682].

@lintool To sum up, our best result improves 2 points after parameter tuning, and also very close to the result of the original torch implementation.

Jul 03 '18 05:07 likicode

castor castor copied to clipboard

Hyper-parameter tuning for VDPWI

castor
castor copied to clipboard