InternVideo issues

Reproduce the demo example in the paper of InternVideo2.5

How can I implement the moment retrieval and abnormal event detection in surveillance videos as shown in Figure 6 of the paper? I have tried using the demo code from...

qzfm

Internvideo2.5显存占用情况

8

我想在一张4090 24g上跑InternVideo2_5_Chat_8B，报段错误(核心已转储)，是不是显存不够？

rabum

Code for instruction tuning

Thanks for the great work! I am currently trining to conduct SFT on InternVideo2-Stage3. However, it seems that the code for instruction tuning (Stage2 -> Stage 3) is still not...

Lyman-Smoker

Demo example in the paper of InternVideo2.5

5

Great work! I have tried the demo on HuggingFace. May I ask how to get the results in Figure 4 and Figure 5 in the paper? i.e., retrieve the specific...

weihaosky

Segment size for OpenGVLab/InternVL_2_5_HiCo_R16

When I want to change the segment size to 130. It gives me an error. RuntimeError: shape mismatch: value tensor of shape [2048, 4096] cannot be broadcast to indexing result...

lianabagh

some error about internvideo 2.5 demo.py

1.In Huggingface or modelscope the newest demo.py forget to set IMAGENET_MEAN and IMAGENET_STD 2.After fixing this issue run demo.py, error is ： Input type(c10:BFloat16) and bias type(c10:Half)should be the same....

zzt941006

InternVideo2_5_Chat_8B：TypeError: chat() got an unexpected keyword argument 'video_path'

1

我在hugging face上下载的模型与代码，但是按照示例代码运行时， model.chat(video_path=video_path,...)这一行代码报错，出现了TypeError: chat() got an unexpected keyword argument 'video_path'的错误；并且我在使用image_processor = model.get_vision_tower().image_processor也会报错'InternVLChatModel' object has no attribute 'get_vision_tower'。我想是不是该模型的部分代码是不是上传错误了（例如上传成InternVL2.5的代码了？）。作者大大看下是不是真的代码上传错了。。。

ExploreUniverser

Fps for 8 frame input to the encoder

1

Hello, is there any recommended fps for the 8 frame input to the intern video 2 stage 2 model, paper mentioned sparsely sampled but didn't mention much details

lakshya-4gp

can internvideo-2.5 support mult-images as input?

1

hi the example given in the HF, is using a video path as input, so can it support multi-images as input? if support, can give the code example to how...

dongdk

how to do inference

i download the model "InternVideo2-Stage1-1B-224p-f8",but can not find the code with inference, can not find something method to do feature extraction.

changyunke

InternVideo
InternVideo copied to clipboard

Metadata

Reproduce the demo example in the paper of InternVideo2.5

Internvideo2.5显存占用情况

Code for instruction tuning

Demo example in the paper of InternVideo2.5

Segment size for OpenGVLab/InternVL_2_5_HiCo_R16

some error about internvideo 2.5 demo.py

InternVideo2_5_Chat_8B：TypeError: chat() got an unexpected keyword argument 'video_path'

Fps for 8 frame input to the encoder

can internvideo-2.5 support mult-images as input?

how to do inference

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard