Data sampling of InternVL2.5-MPO
“Regarding the phrase ‘For instructions with clear ground truths’ mentioned in Section 3.1 of the article, I would like to know how the author evaluates whether the generated responses match the ground truth?” Thanks
Hello,
As stated in the original text, we force the model to output the final answer in the form of 'Final Answer: xxx' at the end. So for the final answer, we can match it with ground truth rules to determine if the answer is correct.
“Thank you for your response! Do you mean exact matching? However, the model’s output ‘final answer’ might have the same meaning but in different forms, such as 0.3 and 3/10, or ‘yes’ and ‘yep’. In cases like this, how do you handle the matching? Is there a script in the repository for this? (I couldn’t find it.)”
Thank you for you interest in our work. We have released our data pipeline and source prompts. You can refer to them for more details. Briefly, we extract prompts from datasets with short or numerical answers to ensure that the answer can be judged using rule-based methods.