Kyle Gao

Results 5 comments of Kyle Gao

Thanks for bringing it up! I'll look into it.

[This quora post](https://www.quora.com/In-seq2seq-models-how-does-one-initialize-the-states-of-a-decoder-when-it-has-a-different-number-of-layers-from-the-encoder) lists two approaches. We can implement one or both of them.

I haven't finished the implementation yet. But you will have to move the probability mass of those non-OOV words from copy_prob to vocab_prob with something similar to [here](https://github.com/IBM/pytorch-seq2seq/blob/copy/seq2seq/models/CopyDecoder.py#L40)

Wait for response from `torchtext`: pytorch/text#127

Tested on WMT15's newstest13 from German to English. Blocked by issue #27 for larger experiments.