castor
castor copied to clipboard
Improve KimCNN results
One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:
- Kim used an Adadelta Rho of 0.95 instead of 0.9. The paper did not mention this.
- Kim used Xavier uniform initialization for the convolution layers. The paper did not mention this.
- Kim did not use the equivalent of torchtext's BucketIterator. This is a difference in Castor.
- Kim used the dev loss as the criterion for model selection. This is a difference in Castor.
After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2 multichannel
now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.
Reference: https://github.com/yoonkim/CNN_sentence/blob/master/conv_net_sentence.py
Seems like I spoke too early. Results fluctuate from high 85s to 87s.