InternVL
InternVL copied to clipboard
evaluation results of InternVL2_5-2B on GSM8K dosen't match with that in paper.
In table 13 of your paper ”Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling” shows that the result of InternVL2_5-2B on GSM8K(4-shot) is about 55, but I tried on myself which only gets around 37.
I'm thinking if it's the prompt i use on InternVL2_5-2B is not the most effective. Could you please show me some examples activating and guiding InternVL2_5 to generate the best answer?