skip-thoughts Github code (decoder GRU layer) different from paper

Github code (decoder GRU layer) different from paper

Open FredericGodin opened this issue 8 years ago • 1 comments

Hi,

I'm stepping through the code and noticed that de GRU layer of the decoder doesn't take into account the context provided by the encoder. As stated in the paper, eq 5, 6 and 7, an extra parameter is added to incorporate the context every time step during decoding.

I think the code is missing the following function (taken from neural machine translation example): https://github.com/kyunghyuncho/dl4mt-material/blob/master/session1/nmt.py#L352

I can add it myself but I'm afraid that I will miss out something and therefore won't be able to get the same results. Especially, given that it should train for 2 weeks...

Best,

Fréderic

Nov 21 '15 13:11 FredericGodin

Hi Fréderic,

The pre-trained model on the main page was trained by conditioning on the encoder output at each time step (namely, it used the "gru_cond_simple_layer"). In the training code now, the encoder output directly initializes the decoder initial state (as in "sequence to sequence learning..." paper). I didn't notice any performance differences between the two, and it saves almost 200 lines of code.

https://github.com/ryankiros/skip-thoughts/blob/master/training/model.py#L86 where dec_ctx (the encoder output) is used to initialize the state. Sorry, perhaps I should make this clear somewhere in the readme.

Ryan

Nov 21 '15 20:11 ryankiros

skip-thoughts skip-thoughts copied to clipboard

Github code (decoder GRU layer) different from paper

skip-thoughts
skip-thoughts copied to clipboard