InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

About video input

Open payphone131 opened this issue 7 months ago • 3 comments

  1. If I input every frame of this video by multiple-image input, is this somewhat equivalent to video input?
  2. How to control the fps of video input?

payphone131 avatar Apr 29 '25 08:04 payphone131

Hi,

You are right. We input the video by extracting frames into multiple images. Please refer to the usage example in README.md. You can set the number of video frames by setting num_segments.

yuecao0119 avatar May 07 '25 13:05 yuecao0119

Hi,

You are right. We input the video by extracting frames into multiple images. Please refer to the usage example in README.md. You can set the number of video frames by setting num_segments.

@yuecao0119 can this model acheive video grouding tasks? since i found the input don't have the fps or num_segments as an input

babyhyf avatar May 15 '25 03:05 babyhyf

Hi,

You are right. We input the video by extracting frames into multiple images. Please refer to the usage example in README.md. You can set the number of video frames by setting num_segments.

Is there any best practice for sampling frame count for video understanding tasks? Or what is the maximum number of sampled frames for a video?

LiYufengzz avatar Sep 21 '25 14:09 LiYufengzz