bottom-up-attention What features are used to train a VQA model? DO you use only 2048-dimension features?

What features are used to train a VQA model? DO you use only 2048-dimension features?

Open cengzy14 opened this issue 6 years ago • 1 comments

In your code, the image_id, image_h, image_w, num_boxes, boxes, features were extracted and saved. But in your paper, it seems that only features are used to present the image. Do you use the embedding of the predicted classes or bbox to train a VQA model?

Apr 16 '18 02:04 cengzy14

No we didn't use the class labels or the bbox. I did some initial experiments like that but performance didn't change much. Mostly we used the boxes just for visualization.

Apr 18 '18 13:04 peteanderson80

bottom-up-attention bottom-up-attention copied to clipboard

What features are used to train a VQA model? DO you use only 2048-dimension features?

bottom-up-attention
bottom-up-attention copied to clipboard