video_to_sequence The sentences generated by the trained model is incomplete

First, thanks for your hard work on the code, it's very generous of you to share the code. :)

But, when I use the model which have been trained to generate sentences. I always get the sentences like this:

a man is talking to a . a man is riding a . a man is playing with .

The sentences are mostly incomplete, like the sentences are truncated.

Strangely, then I used the coco-caption code to evaluate the sentences which I generated. The METEOR value is 27.7%, is very close to the paper.

So, I want to know, how to solve this problem? Can you give some some advice? I think the problem may caused by the code.

Thank you for your assistance.

Dec 02 '16 09:12 chenxinpeng

hello, how long does it take to train this model? Am I really need a GPU to do that? I'm a beginner, thank u.

Dec 07 '16 13:12 lcmaster-hx

@aoki1994 Hi, if you follow the parameters in the original code, the training time will take about 12h. For myself, I change the code, and the hidden units in the LSTM, I have set to 1000, I will take about 24h. Absolutely, I strongly suggest you should have a GPU. BTW, GTX 1080 is enough.

Dec 13 '16 12:12 chenxinpeng

@chenxinpeng thank u very much！I'll have a try.

Dec 18 '16 12:12 lcmaster-hx

For anyone still interested in the original problem, I believe it is caused by the following: In model.py in line 267 the captions for training are loaded. Unfortunately, they still have '.' and ',' (unlike the preprocessed dictionary). Then, in line 268 the last word is dropped for some reason (maybe because of the final '.' ?). That is, during training, the final word of the sentence is never learnt.

To fix this, modify ln 267 and 268 as following:

current_captions = current_batch[ 'Description' ].values

# Remove '.' and ',' from caption
for idx, cc in enumerate( current_captions ):
          current_captions[idx] = cc.replace('.', '').replace(',','')

# Remove the [:-1] in this line!
current_captions_ind  = map( lambda cap : [ wordtoix[word] for word in cap.lower().split(' ') if word in wordtoix], current_captions )

Disclaimer: Have not trained it yet, but the caption and mask now look correct :) Also, make sure your threshold for preProBuildWordVocab is not too low if you are missing words...

Edit: Trained for 200 epochs, can confirm that this fixes it!

Feb 06 '17 05:02 agethen

video_to_sequence video_to_sequence copied to clipboard

The sentences generated by the trained model is incomplete

video_to_sequence
video_to_sequence copied to clipboard