mmstar dataset

Open jhxiang opened this issue 4 months ago • 1 comments

I deployed the InternVL3_5-241B-A28B model and tested it on the mmstar benchmark, but the accuracy I got was only 73.76%. I'd like to know how the paper came up with the 77.9% accuracy, whether I should enable thinking mode, or what the generation config parameters are.

Sep 06 '25 02:09 jhxiang

Thanks for raising this. In our experiments for the paper, we enabled Thinking Mode when evaluating InternVL3_5-241B-A28B on the MMStar benchmark. The unified Thinking Mode generation parameters are as follows:

max_new_tokens = 65536 do_sample = True temperature = 0.6 top_p = 0.95

Sep 08 '25 13:09 Sorr7maker