InternVL About video input

If I input every frame of this video by multiple-image input, is this somewhat equivalent to video input?
How to control the fps of video input?

Apr 29 '25 08:04 payphone131

Hi,

You are right. We input the video by extracting frames into multiple images. Please refer to the usage example in README.md. You can set the number of video frames by setting num_segments.

May 07 '25 13:05 yuecao0119

Hi,

You are right. We input the video by extracting frames into multiple images. Please refer to the usage example in README.md. You can set the number of video frames by setting num_segments.

@yuecao0119 can this model acheive video grouding tasks? since i found the input don't have the fps or num_segments as an input

May 15 '25 03:05 babyhyf

Hi,

You are right. We input the video by extracting frames into multiple images. Please refer to the usage example in README.md. You can set the number of video frames by setting num_segments.

Is there any best practice for sampling frame count for video understanding tasks? Or what is the maximum number of sampled frames for a video?

Sep 21 '25 14:09 LiYufengzz