InternVideo
InternVideo copied to clipboard
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
How can I implement the moment retrieval and abnormal event detection in surveillance videos as shown in Figure 6 of the paper? I have tried using the demo code from...
我想在一张4090 24g上跑InternVideo2_5_Chat_8B,报段错误(核心已转储),是不是显存不够?
Thanks for the great work! I am currently trining to conduct SFT on InternVideo2-Stage3. However, it seems that the code for instruction tuning (Stage2 -> Stage 3) is still not...
Great work! I have tried the demo on HuggingFace. May I ask how to get the results in Figure 4 and Figure 5 in the paper? i.e., retrieve the specific...
When I want to change the segment size to 130. It gives me an error. RuntimeError: shape mismatch: value tensor of shape [2048, 4096] cannot be broadcast to indexing result...
1.In Huggingface or modelscope the newest demo.py forget to set IMAGENET_MEAN and IMAGENET_STD 2.After fixing this issue run demo.py, error is : Input type(c10:BFloat16) and bias type(c10:Half)should be the same....
我在hugging face上下载的模型与代码,但是按照示例代码运行时, model.chat(video_path=video_path,...)这一行代码报错,出现了TypeError: chat() got an unexpected keyword argument 'video_path'的错误; 并且我在使用image_processor = model.get_vision_tower().image_processor也会报错'InternVLChatModel' object has no attribute 'get_vision_tower'。我想是不是该模型的部分代码是不是上传错误了(例如上传成InternVL2.5的代码了?)。 作者大大看下是不是真的代码上传错了。。。
Hello, is there any recommended fps for the 8 frame input to the intern video 2 stage 2 model, paper mentioned sparsely sampled but didn't mention much details
hi the example given in the HF, is using a video path as input, so can it support multi-images as input? if support, can give the code example to how...
i download the model "InternVideo2-Stage1-1B-224p-f8",but can not find the code with inference, can not find something method to do feature extraction.