InternVL
InternVL copied to clipboard
mmstar dataset
I deployed the InternVL3_5-241B-A28B model and tested it on the mmstar benchmark, but the accuracy I got was only 73.76%. I'd like to know how the paper came up with the 77.9% accuracy, whether I should enable thinking mode, or what the generation config parameters are.
Thanks for raising this. In our experiments for the paper, we enabled Thinking Mode when evaluating InternVL3_5-241B-A28B on the MMStar benchmark. The unified Thinking Mode generation parameters are as follows:
max_new_tokens = 65536 do_sample = True temperature = 0.6 top_p = 0.95