EditNTS icon indicating copy to clipboard operation
EditNTS copied to clipboard

The input to the Model and the target is the same.

Open adityakrgupta25 opened this issue 4 years ago • 2 comments

Hi, I have been trying to understand the provided code and have certain concerns about the decoder.

main.py contains the following pieces of code

out = prepared_batch[2][:, :]
tar = prepared_batch[2][:, 1:]

I presume these are the output and the target (and have the same information content ie. edit actions) . out is then being fed to the edit_net, this does not seem to make sense.

output = edit_net(org, out, org_ids, org_pos, simp_ids)

thereafter, In the decoder part, the code uses the same out does manipulation to create output_t which is then returned as the result of the EditNet. I am unable to find the parts in your code where prediction for actions is happening.

I am unable to understand that if your model is being fed the edit actions already what exactly is your model predicting?

adityakrgupta25 avatar Jun 16 '20 12:06 adityakrgupta25

Hi, @yuedongP , can you provide updates and elaborate on the above?

adityakrgupta25 avatar Jun 24 '20 16:06 adityakrgupta25

Hi, this is the standard way to train the seq2seq model with teacher forcing where you would shift the input and output by one and compute the maximum log-likelihood. (please not the out start with index 0 and tar start with index 1, there's a shift here and thus input and output are not the same!) out = prepared_batch[2][:, :] tar = prepared_batch[2][:, 1:]

Here we give the expert edit sequence for training (but at time step t, the model can only see the gold edit label of time t-1) and if you look at the actual model (editNTS.py) you will see that the prediction is being shifted by one during training.

During the inference, the model however does not have this information and has to do inference one edit at a time.

Many thanks for your question and I hope this clarify your concern.

YueDongCS avatar Jun 24 '20 16:06 YueDongCS