skip-thoughts
skip-thoughts copied to clipboard
Github code (decoder GRU layer) different from paper
Hi,
I'm stepping through the code and noticed that de GRU layer of the decoder doesn't take into account the context provided by the encoder. As stated in the paper, eq 5, 6 and 7, an extra parameter is added to incorporate the context every time step during decoding.
I think the code is missing the following function (taken from neural machine translation example): https://github.com/kyunghyuncho/dl4mt-material/blob/master/session1/nmt.py#L352
I can add it myself but I'm afraid that I will miss out something and therefore won't be able to get the same results. Especially, given that it should train for 2 weeks...
Best,
Fréderic
Hi Fréderic,
The pre-trained model on the main page was trained by conditioning on the encoder output at each time step (namely, it used the "gru_cond_simple_layer"). In the training code now, the encoder output directly initializes the decoder initial state (as in "sequence to sequence learning..." paper). I didn't notice any performance differences between the two, and it saves almost 200 lines of code.
https://github.com/ryankiros/skip-thoughts/blob/master/training/model.py#L86 where dec_ctx (the encoder output) is used to initialize the state. Sorry, perhaps I should make this clear somewhere in the readme.
Ryan