InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

会把手当成脚,还会自己乱猜测,剪指甲说成玩手机 提示词:data = {"question": "描述视频内容并添加时间戳"} {'answer': '视频中,一名穿着深色上衣和浅色裤子的男子坐在沙发上,他正用双手给一名躺在沙发上的女子做足部按摩。女子穿着黑色上衣和绿色裤子,双腿伸直放在沙发上,头靠在沙发靠背上。男子专注地按摩女子的脚部,手法看起来专业而有力。房间光线明亮,窗户外透进自然光。沙发是深灰色的,旁边有一个橙色靠垫。整个画面给人一种温馨舒适的感觉,突出了按摩带来的放松效果。'} 是要调节温度吗

HI authors, I appreciate your amazing work. I am using InternVideo2-1B as my work backbone. And want to train a clip model with our customized dataset. Could you provide the...

Hello, I am trying to use the Hugging Face model [`OpenGVLab/InternVideo2-Stage1-1B-224p-K400`](https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-K400) with the `transformers` library for video feature extraction. When I call: ```python from transformers import AutoImageProcessor processor = AutoImageProcessor.from_pretrained("OpenGVLab/InternVideo2-Stage1-1B-224p-K400")...

during model loading the check points weight and vocab size seems to be wrong below is the code I used to generate this result, which has also been mentioned by...

@shepnerd @leexinhao Hi, huge thanks to the authors for releasing this amazing project! I'm looking forward to using your model and data in my own research. Regarding video-text data curation,...

Hi OpenGVLab team, thank you very much for all your excellent models. In the InternVideo 2.5 paper section 3.1, it is mentioned that: > (1) uniform token pruning in early...

Hi OpenGVLab team, thank you for all the great models. Would your team be releasing the method for InternVideo2.5 Stage 2 like you did for InternVideo2?

Thank you for your work! I have a question: In the paper, it is stated: "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by...

How much VRAM is required to run this model efficiently?