InternVideo issues

指令遵循，内容理解有问题

会把手当成脚，还会自己乱猜测，剪指甲说成玩手机提示词：data = {"question": "描述视频内容并添加时间戳"} {'answer': '视频中，一名穿着深色上衣和浅色裤子的男子坐在沙发上，他正用双手给一名躺在沙发上的女子做足部按摩。女子穿着黑色上衣和绿色裤子，双腿伸直放在沙发上，头靠在沙发靠背上。男子专注地按摩女子的脚部，手法看起来专业而有力。房间光线明亮，窗户外透进自然光。沙发是深灰色的，旁边有一个橙色靠垫。整个画面给人一种温馨舒适的感觉，突出了按摩带来的放松效果。'} 是要调节温度吗

dyhuachi

HotShot都失效了，下载不了

请问可以发我一份吗？

ttttttttbaba111-byte

About how to finetune the InternVideo2-1B models on a customized dataset to get an InternVideo2-1B Clip model.

1

HI authors, I appreciate your amazing work. I am using InternVideo2-1B as my work backbone. And want to train a clip model with our customized dataset. Could you provide the...

ChangyuLiu2022

InternVideo2-Stage1-1B-224p-K400 missing processor/config for Hugging Face transformers

Hello, I am trying to use the Hugging Face model [`OpenGVLab/InternVideo2-Stage1-1B-224p-K400`](https://huggingface.co/OpenGVLab/InternVideo2-Stage1-1B-224p-K400) with the `transformers` library for video feature extraction. When I call: ```python from transformers import AutoImageProcessor processor = AutoImageProcessor.from_pretrained("OpenGVLab/InternVideo2-Stage1-1B-224p-K400")...

Dimlight

Not able to reproduce the stage two models

during model loading the check points weight and vocab size seems to be wrong below is the code I used to generate this result, which has also been mentioned by...

formanuscriptsharing

About video dataset curation pipeline

1

@shepnerd @leexinhao Hi, huge thanks to the authors for releasing this amazing project! I'm looking forward to using your model and data in my own research. Regarding video-text data curation,...

jjihwan

Attention-Guided Token Selection Algorithm in InternVideo 2.5

Hi OpenGVLab team, thank you very much for all your excellent models. In the InternVideo 2.5 paper section 3.1, it is mentioned that: > (1) uniform token pruning in early...

chenghuaWang

InternVideo2.5 Stage 2

3

Hi OpenGVLab team, thank you for all the great models. Would your team be releasing the method for InternVideo2.5 Stage 2 like you did for InternVideo2?

Magmanat

Questions About InternVideo2clip Training Data and Fine-Tuning Requirements

Thank you for your work! I have a question: In the paper, it is stated: "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by...

JayChen7777

Query for InternVL2.5_HiCo_R16 (VRAM)

How much VRAM is required to run this model efficiently?

Magmanat

InternVideo
InternVideo copied to clipboard

Metadata

指令遵循，内容理解有问题

HotShot都失效了，下载不了

About how to finetune the InternVideo2-1B models on a customized dataset to get an InternVideo2-1B Clip model.

InternVideo2-Stage1-1B-224p-K400 missing processor/config for Hugging Face transformers

Not able to reproduce the stage two models

About video dataset curation pipeline

Attention-Guided Token Selection Algorithm in InternVideo 2.5

InternVideo2.5 Stage 2

Questions About InternVideo2clip Training Data and Fine-Tuning Requirements

Query for InternVL2.5_HiCo_R16 (VRAM)

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard