MiniCPM-V icon indicating copy to clipboard operation
MiniCPM-V copied to clipboard

What are the evaluation plans for each modality?

Open bobo0810 opened this issue 10 months ago • 3 comments

  1. Image Understanding:the official VLMEvalKit library ✅
  2. Multi-image and Video Understanding❓
  3. Audio Understanding❓
  4. Speech Generation❓
  5. End-to-end Voice Cloning❓
  6. Multimodal Live Streaming❓

bobo0810 avatar Feb 14 '25 02:02 bobo0810

Hi @bobo0810 ,

Thanks for your question! I will provide some details specifically for Multimodal Live Streaming, as you requested.

For evaluating MiniCPM-o 2.6's capabilities in Multimodal Live Streaming, you can refer to the following repository: https://github.com/THUNLP-MT/StreamingBench. This repository provides a comprehensive benchmark and evaluation framework for streaming MLLMs.

Here's a step-by-step guide to reproduce the results for MiniCPM-o 2.6 on the StreamingBench:

  1. Inference Code: The inference code for MiniCPM-o 2.6 is located within the src/model directory of the StreamingBench repository.

  2. Evaluation Pipeline: Follow the "Evaluation Pipeline" instructions in the StreamingBench repository's README. This involves three main stages:

    • Data Preparation
    • Model Preparation
    • Evaluation

By following these steps and the instructions within the StreamingBench repository, you should be able to fully reproduce the evaluation results for MiniCPM-o 2.6 on Multimodal Live Streaming. Let me know if you have any further questions!

mjuicem avatar Feb 14 '25 21:02 mjuicem

@mjuicem Thank you very much. May I ask how to reproduce the indicators for multiple pictures, videos and audio?

bobo0810 avatar Feb 15 '25 11:02 bobo0810

@lihytotoro evaluation for multiple pictures, videos and audio

https://github.com/OpenBMB/UltraEval-Audio evaluation for audio

Cuiunbo avatar Feb 17 '25 02:02 Cuiunbo