MMMU val evaluation

Open LongIslandWithoutIceTea opened this issue 1 year ago • 1 comments

Hi guys,

Amazing work you guys have done here!

However, when I'm trying to reproduce MMMU benchmarking using your model, I got low overall ACC (0.35) and my evaluation calls the .chat() function directly:

    msgs = [{'role': 'user', 'content': text}]
    response = model.chat(
        image=image, # PIL.Image
        msgs=msgs,
        tokenizer=model.tokenizer,
        sampling=False,
    )

I'm sure you guys are doing something specific about MMMU because other benchmarks like MathVista or OCRBench are just fine.

Could you kindly share your setups for evaluation on MMMU, I'm using the latest MiniCPM-Llama3-V-2_5 from HuggingFace.

Thanks in advance!

May 23 '24 23:05 LongIslandWithoutIceTea

https://github.com/open-compass/VLMEvalKit

You can try this implement. We processed input as interleaved images and texts, and we use the vlmevalkit calculation method in order to compare with other models. @LongIslandWithoutIceTea

May 24 '24 05:05 Cuiunbo