Results 16 comments of Yoon Kim

I think the discrepancy is due to sentence-level F1 (adopted by PRPN) vs corpus-level F1 (adopted by EVALB and previous works). Thus the numbers are not exactly comparable, though they...

Hi sorry I just saw this. I don't remember getting this issue in Pytorch 0.2. Which line in the code is triggerring this?

Hi, I think the above error can be fixed by changing it to ``` train_nll_vae += nll_vae.item()*batch_size ``` Hope this helps!

For the GRU implementation, in addition to modifying make_lstm, it will require some tinkering in the training code, as GRU doesn't have a cell state in addition to the hidden...

i don't really understand why either, but empirically i've found this to be the case (vanilla SGD doesn't really work well with GRU).

Hmm yeah preprocess-shards should really be part of preprocess with a --shardsize option... I'll see if I have a chance to factorize it