MiniCPM-V 2.6 VideoMME Evaluation #Frames?
https://github.com/OpenBMB/MiniCPM-V/blob/a209258d851f404485e5ae25864417dff3bb74ca/eval_mm/vlmevalkit/vlmeval/dataset/videomme.py Code says 8 frames are used for a video. But the leaderboard says (https://video-mme.github.io/home_page.html#leaderboard) 64 frames are used. Can I know how many frames were used to produce the results in the leaderboard?
I also want to know if this 64 frames can only be less but not more. I want to process a long video, almost two hours, and only extracting 64 frames for understanding is a bit little.
Also, could you explain what these parameters mean? max_slice_nums, image_feature_size, MAX_NUM_FRAMES
Thanks in advance :)
Minicpm-v 2.6 has not been evaluated on video-mme using vlmevalkit. Our evaluation setting for video-mme is as follows: max_slice_nums=1 and MAX_NUM_FRAMES=64.
max_slice_nums represents the maximum number of slices when encoding a single image, MAX_NUM_FRAMES denotes the maximum number of frames extracted from a video, and image_feature_size=64 indicates that the token length for representing a single image (or slice) is 64.