castor icon indicating copy to clipboard operation
castor copied to clipboard

Improve KimCNN results

Open daemon opened this issue 5 years ago • 1 comments

One and a half years later, I'm finally getting better results on KimCNN using the original hyperparameters in the paper. There are a few discrepancies with the PyTorch and Castor implementation:

  • Kim used an Adadelta Rho of 0.95 instead of 0.9. The paper did not mention this.
  • Kim used Xavier uniform initialization for the convolution layers. The paper did not mention this.
  • Kim did not use the equivalent of torchtext's BucketIterator. This is a difference in Castor.
  • Kim used the dev loss as the criterion for model selection. This is a difference in Castor.

After these changes, the original hyperparameters in the paper work quite well. I'm getting 87.8 for SST-2 multichannel now, which is an improvement over the current 87.4. It's still a bit off from the paper result of 88.1, though.

Reference: https://github.com/yoonkim/CNN_sentence/blob/master/conv_net_sentence.py

daemon avatar Mar 10 '19 05:03 daemon

Seems like I spoke too early. Results fluctuate from high 85s to 87s.

daemon avatar Mar 10 '19 08:03 daemon