seq2seq
seq2seq copied to clipboard
Training question
Am I right saying that this LSTM during training uses its own generated output as input? This could explain low precision of trained models compared to tensorflow implementation. See
In many applications of sequence-to-sequence models, the output of the decoder at time t is fed back and becomes the input of the decoder at time t+1. At test time, when decoding a sequence, this is how the sequence is constructed. During training, on the other hand, it is common to provide the correct input to the decoder at every time-step, even if the decoder made a mistake before. Functions in seq2seq.py support both modes using the feed_previous argument. For example, let's analyze the following use of an embedding RNN model.
https://www.tensorflow.org/versions/r0.9/tutorials/seq2seq/index.html
This problem leads to have repetitive words in predictions because model tries to put same common words in many places [why ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i] I have similar results with raw keras and softmax activation.
While if model will not accumulate error but uses correct target output as input during train results could be better and it seems like common way of training HMM/CRF.
Would it be possible to implement?
The weird as as as as you see is due to bad word embeddings. If there is enough interest i could roll out an example that actually works.
Yes, that would be super helpful. Something relatively advanced like this subtitles generation https://arxiv.org/pdf/1506.05869.pdf or generating headlines like this http://arxiv.org/abs/1512.01712
I am having a problem training a sequence-to-sequence model. I set up a simple model like this:
model = Seq2seq(
batch_input_shape=(BATCH_SIZE, INPUT_SEQUENCE_LENGTH, TOKEN_REPRESENTATION_SIZE),
output_dim=len(target_index_to_token),
hidden_dim=HIDDEN_LAYER_DIMENSION,
output_length=OUTPUT_SEQUENCE_LENGTH,
depth=1)
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
The problem is the predicted output tends to contain repeated words and at the end of my training phase, all words in the output sequence are the same. What do you think could be the reasons for this? I don't think it is a word embedding issue because I tried different word embedding and the output still has same issue.
One example of my output:
"$$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ $$$ unanimity unanimity unanimity unanimity unanimity 46 46 freshener freshener freshener freshener freshener freshener freshener freshener freshener freshener rabona rabona rabona rabona agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia agoraphobia claimant claimant claimant claimant zooplankton zooplankton zooplankton zooplankton zooplankton zooplankton zooplankton unlabeled unlabeled unlabeled unlabeled"
Hopefully I can receive some comments or suggestion from you guys to resolve this issue. Thank you very much.
will write my own example soon.
I have the same issue: several repeated words in the predicted output. Another issue is that the loss is always nan when I use categorical crossentropy as the objective. I would really appreciate a working example. Thanks!
I am also getting same issue. Trying to do seq2seq problem using Keras - LSTM. Predicted output words matches with most frequent words of the vocabulary built using the dataset. Not sure what could be the reason.
If there are lines of the same words like "the the the the the," it is probably an under fitting error, which means your learning rate might be too high.