Ben Bolte
Ben Bolte
I was looking through some of it yesterday and realized my `GESD` implementation was broken. The fixed one is in the repo now, try with that. It may give better...
I trained the attention model and printed out some predicted and expected answers, then dumped them in [this gist](https://gist.github.com/codekansas/9429ccfb1675da28f3186892180ba878). You guys can decide for yourself. I'm more or less ready...
I noticed the two scripts run for 2000000 (CNN) and 20000000 (LSTM+CNN) batches, I think it must have taken a really long time to train. The results I included were...
Wow, I did not realize the Teslas are so fast... I'll just run it for a while on my 980ti I suppose. Character level embeddings though? It looks like regular...
I think the performance really depends on how long you run it. I ran a CNN-LSTM model for ~700 epochs and got a precision of 0.52, going to run it...
Ended up with ``` Best: Loss = 0.001460216869, Epoch = 879 2016-08-14 05:58:27 :: ----- test1 ----- [====================]Top-1 Precision: 0.564444 MRR: 0.680506 2016-08-14 06:17:06 :: ----- test2 ----- [====================]Top-1 Precision:...
17 days seems slow for that GPU? I wonder if it is slow for some reason (maybe it's running on the CPU instead of the GPU?) But 3000 epochs \*...
I fixed this just now. I think the output shape should just always be `(None, 1)`. The thing is, I don't think it made a difference. I think the `nan`...
The names are the same, test 1 and test 2 should be the same as from the papers. The validation data is generated by splitting the training data.
I'm still trying things out. In `insurance_qa_eval.py` I loaded some pre-trained embeddings, but I haven't put the embeddings on Github yet. To generate them, I trained Gensim's Word2Vec model to...