Results 16 comments of Yoon Kim

Cool stuff! I noticed on the README that you are using 100/150 hidden units for small/large models respectively. I actually use 300/650 hidden units, so this might explain the difference...

Ah ok! Few other things may be: - batch size - parameter initialization

I think it should be a lot lower. I don't recall the numbers exactly but since the dataset is small and the model has a lot of capacity (even with...

it's because we do SGD with mini-batches, and each mini-batch has sentences of varying lengths. one could sort/group the batches based on sentence length and then there would be no...

Feel free push if you've modified to code get GPU working, and I'll make sure to merge :)

That's correct, we go directly from the CNN output to the softmax, without any hidden layers.

You want to create (and compile) a theano function whose output is y_pred, given the input.

cool, feel free to send a pull request!

Hi, you can obtain all the datasets here: https://github.com/harvardnlp/sent-conv-torch Phrases were not taken into account from word2vec.

There is randomness built into the models (due to initialization) so you shouldn't expect the nearest neighbors to be exactly the same. Your nearest neighbors seem to make sense (and...