VLMEvalKit icon indicating copy to clipboard operation
VLMEvalKit copied to clipboard

[Benchmark] Support Video MCQ with TaskMeAnything-v1-video-random as an example

Open weikaih04 opened this issue 6 months ago • 5 comments

Hi,

For this PR

  1. I added TaskMeAnything-v1-video-random video benchmark which includes 2700 video mcq questions.
  2. Along with the benchmark, I found that unlike image mcq, video datasets don’t have a video_mcq.py file, which might hard for adding other mcq video benchmark. Therefore, I implemented video_mcq.py following the logic of image_mcq.py.

The usage of video_mcq.py is the same as image_mcq.py: Just convert the benchmark to a TSV file, and encode the MP4 video to base64. I have provided the function named mp4_to_base64 in vlmeval/dataset/utils/video_mcq_utils.py.

I added the TaskMeAnything-v1-video-random video benchmark as an example for video_mcq.py and tested it on Paligemma (ImageQA model) and Video-LLaVA (VideoQA model), and it works well.

weikaih04 avatar Aug 05 '24 19:08 weikaih04