lxmert
lxmert copied to clipboard
VQA dataset - how to calculate the label probability
Thank you for the great work.
While looking through the already preprocessed VQA dataset in your repo, I noticed there are label probabilities to each answer.
Below I list a pair of examples from the same question (one is from original VQA dataset and the other from your dataset).
{"question_type": "what color is the", "multiple_choice_answer": "green", "answers": [ {"answer": "gray", "answer_confidence": "yes", "answer_id": 1}, {"answer": "green", "answer_confidence": "maybe", "answer_id": 2}, {"answer": "green", "answer_confidence": "yes", "answer_id": 3}, {"answer": "black, white", "answer_confidence": "yes", "answer_id": 4}, {"answer": "green", "answer_confidence": "maybe", "answer_id": 5}, {"answer": "gray", "answer_confidence": "maybe", "answer_id": 6}, {"answer": "green", "answer_confidence": "no", "answer_id": 7}, {"answer": "brown", "answer_confidence": "yes", "answer_id": 8}, {"answer": "gray", "answer_confidence": "yes", "answer_id": 9}, {"answer": "green", "answer_confidence": "maybe", "answer_id": 10} ], "image_id": 131089, "answer_type": "other", "question_id": 131089000}
{ "answer_type": "other", "img_id": "COCO_val2014_000000131089", "label": { "brown": 0.3, "gray": 0.9, "green": 1 }, "question_id": 131089000, "question_type": "what color is the", "sent": "What color is the grass in this picture?" }
I am wondering how you assign the probabilities to each answer.
The label prob is calculated based on the final evaluation system: https://visualqa.org/evaluation.html
The "accuracy" is defined as:
acc(answer) = min{ #answer / 3, 1}
Thus it could be converted to a mapping like:
#answer prob
0 0.0
1 0.3
2 0.6
3 0.9
4 1.0
5 1.0
....
I borrow the code from get_score from Hengyuan Hu's bottom-up-attention-vqa. Not sure whether it is the original version but it helps a lot on the way. Would like to give the credit to the original authors :).
hi @airsplay,
I see there is no "color" in the "answer_type" of your dataset. So, is it that you have combined the "other" and "color" answer types from the original VQA 2.0 dataset?
Thanks.