seq2seq icon indicating copy to clipboard operation
seq2seq copied to clipboard

Training question

Open yurkor opened this issue 8 years ago • 7 comments

Am I right saying that this LSTM during training uses its own generated output as input? This could explain low precision of trained models compared to tensorflow implementation. See

In many applications of sequence-to-sequence models, the output of the decoder at time t is fed back and becomes the input of the decoder at time t+1. At test time, when decoding a sequence, this is how the sequence is constructed. During training, on the other hand, it is common to provide the correct input to the decoder at every time-step, even if the decoder made a mistake before. Functions in seq2seq.py support both modes using the feed_previous argument. For example, let's analyze the following use of an embedding RNN model.

https://www.tensorflow.org/versions/r0.9/tutorials/seq2seq/index.html

This problem leads to have repetitive words in predictions because model tries to put same common words in many places [why ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i] I have similar results with raw keras and softmax activation.

While if model will not accumulate error but uses correct target output as input during train results could be better and it seems like common way of training HMM/CRF.

Would it be possible to implement?

yurkor avatar Jul 19 '16 16:07 yurkor

The weird as as as as you see is due to bad word embeddings. If there is enough interest i could roll out an example that actually works.

farizrahman4u avatar Jul 19 '16 16:07 farizrahman4u

Yes, that would be super helpful. Something relatively advanced like this subtitles generation https://arxiv.org/pdf/1506.05869.pdf or generating headlines like this http://arxiv.org/abs/1512.01712

yurkor avatar Jul 20 '16 05:07 yurkor

I am having a problem training a sequence-to-sequence model. I set up a simple model like this:

model = Seq2seq(
    batch_input_shape=(BATCH_SIZE, INPUT_SEQUENCE_LENGTH, TOKEN_REPRESENTATION_SIZE),
    output_dim=len(target_index_to_token),
    hidden_dim=HIDDEN_LAYER_DIMENSION,
    output_length=OUTPUT_SEQUENCE_LENGTH,
    depth=1)

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

The problem is the predicted output tends to contain repeated words and at the end of my training phase, all words in the output sequence are the same. What do you think could be the reasons for this? I don't think it is a word embedding issue because I tried different word embedding and the output still has same issue.

One example of my output:

"$$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ unanimity unanimity unanimity unanimity unanimity 46 46 freshener freshener freshener freshener freshener freshener freshener freshener freshener freshener rabona rabona rabona rabona agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia claimant claimant claimant claimant zooplankton zooplankton zooplankton zooplankton zooplankton zooplankton zooplankton unlabeled unlabeled unlabeled unlabeled"

Hopefully I can receive some comments or suggestion from you guys to resolve this issue. Thank you very much.

xuanchien avatar Jul 22 '16 03:07 xuanchien

will write my own example soon.

farizrahman4u avatar Jul 22 '16 03:07 farizrahman4u

I have the same issue: several repeated words in the predicted output. Another issue is that the loss is always nan when I use categorical crossentropy as the objective. I would really appreciate a working example. Thanks!

nabihach avatar Jul 26 '16 19:07 nabihach

I am also getting same issue. Trying to do seq2seq problem using Keras - LSTM. Predicted output words matches with most frequent words of the vocabulary built using the dataset. Not sure what could be the reason.

asksonu avatar Apr 03 '18 19:04 asksonu

If there are lines of the same words like "the the the the the," it is probably an under fitting error, which means your learning rate might be too high.

andlyu avatar Sep 25 '18 10:09 andlyu