bottom-up-attention icon indicating copy to clipboard operation
bottom-up-attention copied to clipboard

What features are used to train a VQA model? DO you use only 2048-dimension features?

Open cengzy14 opened this issue 6 years ago • 1 comments

In your code, the image_id, image_h, image_w, num_boxes, boxes, features were extracted and saved. But in your paper, it seems that only features are used to present the image. Do you use the embedding of the predicted classes or bbox to train a VQA model?

cengzy14 avatar Apr 16 '18 02:04 cengzy14

No we didn't use the class labels or the bbox. I did some initial experiments like that but performance didn't change much. Mostly we used the boxes just for visualization.

peteanderson80 avatar Apr 18 '18 13:04 peteanderson80