MMMU val evaluation
Hi guys,
Amazing work you guys have done here!
However, when I'm trying to reproduce MMMU benchmarking using your model, I got low overall ACC (0.35) and my evaluation calls the .chat() function directly:
msgs = [{'role': 'user', 'content': text}]
response = model.chat(
image=image, # PIL.Image
msgs=msgs,
tokenizer=model.tokenizer,
sampling=False,
)
I'm sure you guys are doing something specific about MMMU because other benchmarks like MathVista or OCRBench are just fine.
Could you kindly share your setups for evaluation on MMMU, I'm using the latest MiniCPM-Llama3-V-2_5 from HuggingFace.
Thanks in advance!
https://github.com/open-compass/VLMEvalKit
You can try this implement. We processed input as interleaved images and texts, and we use the vlmevalkit calculation method in order to compare with other models. @LongIslandWithoutIceTea