Semi-Supervised-Image-Captioning TypeError

When I run the "python train.py --saveto commoncraw_pretrained --dataset commoncrawl --cutoff 15", the got the following error:

Traceback (most recent call last): File "train.py", line 341, in train(**common_kwargs) File "train.py", line 215, in train cost = f_grad_shared(x, mask, ctx, cnn_feats) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 786, in call allow_downcast=s.allow_downcast) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/type.py", line 177, in filter data.shape)) TypeError: ('Bad input argument to theano function with name "attention_generator/optimizers.py:64" at index 2(0-based)', 'Wrong number of dimensions: expected 3, got 2 with shape (256, 4500).')

How I solve it? Thanks!

Dec 22 '16 07:12 getengqing

I will test and get back to you as soon as possible.

Dec 22 '16 23:12 wenhuchen

Thanks! Looking forward to your reply.

Dec 23 '16 00:12 getengqing

I already found the problem, I fixed it by adding the following line.

y = numpy.zeros((len(feat_list), options['cutoff'], options['semantic_dim']), dtype='float32')
for idx, ff in enumerate(feat_list):
    y[idx] = ff.reshape((-1, options['semantic_dim']))

This is the semantic features described in the paper, should be 3-dimension, (batch_size x cutoff x semantic_dimension). Where we use cutoff = 15, that is to say we use 15 detected words as input, semantic_dimension=300, that saying we use GloVe feature with 300 dimension.

Thanks for testing the code (I can't test it since I can't access the cluster at the moment), feel free if you have any further questions, I would be able to help you.

Dec 25 '16 21:12 wenhuchen

Thanks, I'll try again！

Dec 26 '16 00:12 getengqing

When I run the "python train.py --saveto commoncraw_pretrained --dataset commoncrawl --cutoff 15", the code beginning is good, not after 4379 steps: Epoch 0, Updates: 4378, Cost is: 42.459641 Epoch 0, Updates: 4379, Cost is: 42.928699 Traceback (most recent call last): File "train.py", line 341, in train(**common_kwargs) File "train.py", line 254, in train caps = process_examples([f_init], [f_next], imgid, valid[1], valid[2], word_idict, model_options) File "attention_generator/generate_caps.py", line 48, in process_examples options, trng=trng, k=k, maxlen=30) File "attention_generator/capgen.py", line 524, in gen_sample rval = f_init(ctx, cnn_feats) TypeError: 'list' object is not callable

Dec 26 '16 09:12 getengqing

I already fixed the bug by changing the line in generate_caps.py

     sample, score, alpha = gen_sample(f_init[0], f_next[0], ctx_cutoff, cnn_feats[0],

When f_init is a list, the ensemble decoding will be automatically aroused. Besides, I also uploaded the best model "coco_bleu_best.zip", you can unzip it to get a pkl and npz, with that you can easily call generate_caps.py to reproduce the results reported in the paper.

Thanks for your testing, feel free to ask further questions.

Dec 26 '16 10:12 wenhuchen

Thank you for your help! I can run the code, but I have a question: why the computing CIDEr score is too hight(CIDEr: 3.819)? The best CIDEr score on the Microsoft COCO Image Captioning Challenge only 1.146.

Dec 28 '16 06:12 getengqing

Did you use the ms-coco dataset for testing (not commoncrawl)? Notice that my split could be different from your dataset split. Please refer to "ETHZ-Bootstrapped-Captioning/Data/coco/", there are files named "caption-train/val/test.json", 5000/5000 are used for val/test while the rest for training, the split strategy comes from Karpathy's github, you should verify that the training data doesn't contain your validation data. Otherwise, you might need to resplit your dataset.

Dec 28 '16 10:12 wenhuchen

o, I know！Thanks！

Dec 29 '16 00:12 getengqing

Semi-Supervised-Image-Captioning Semi-Supervised-Image-Captioning copied to clipboard

TypeError

Semi-Supervised-Image-Captioning
Semi-Supervised-Image-Captioning copied to clipboard