video_to_sequence icon indicating copy to clipboard operation
video_to_sequence copied to clipboard

The sentences generated by the trained model is incomplete

Open chenxinpeng opened this issue 8 years ago • 4 comments

First, thanks for your hard work on the code, it's very generous of you to share the code. :)

But, when I use the model which have been trained to generate sentences. I always get the sentences like this:

a man is talking to a . a man is riding a . a man is playing with .

The sentences are mostly incomplete, like the sentences are truncated.

Strangely, then I used the coco-caption code to evaluate the sentences which I generated. The METEOR value is 27.7%, is very close to the paper.

So, I want to know, how to solve this problem? Can you give some some advice? I think the problem may caused by the code.

Thank you for your assistance.

chenxinpeng avatar Dec 02 '16 09:12 chenxinpeng

hello, how long does it take to train this model? Am I really need a GPU to do that? I'm a beginner, thank u.

lcmaster-hx avatar Dec 07 '16 13:12 lcmaster-hx

@aoki1994 Hi, if you follow the parameters in the original code, the training time will take about 12h. For myself, I change the code, and the hidden units in the LSTM, I have set to 1000, I will take about 24h. Absolutely, I strongly suggest you should have a GPU. BTW, GTX 1080 is enough.

chenxinpeng avatar Dec 13 '16 12:12 chenxinpeng

@chenxinpeng thank u very much!I'll have a try.

lcmaster-hx avatar Dec 18 '16 12:12 lcmaster-hx

For anyone still interested in the original problem, I believe it is caused by the following: In model.py in line 267 the captions for training are loaded. Unfortunately, they still have '.' and ',' (unlike the preprocessed dictionary). Then, in line 268 the last word is dropped for some reason (maybe because of the final '.' ?). That is, during training, the final word of the sentence is never learnt.

To fix this, modify ln 267 and 268 as following:

current_captions = current_batch[ 'Description' ].values

# Remove '.' and ',' from caption
for idx, cc in enumerate( current_captions ):
          current_captions[idx] = cc.replace('.', '').replace(',','')

# Remove the [:-1] in this line!
current_captions_ind  = map( lambda cap : [ wordtoix[word] for word in cap.lower().split(' ') if word in wordtoix], current_captions )

Disclaimer: Have not trained it yet, but the caption and mask now look correct :) Also, make sure your threshold for preProBuildWordVocab is not too low if you are missing words...

Edit: Trained for 200 epochs, can confirm that this fixes it!

agethen avatar Feb 06 '17 05:02 agethen