InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

According to the ReadMe at [https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Downstream/Video-Text-Retrieval](url), the zero-shot retrieval results will be obtained after running the command `./zeroshot_scripts/eval_msrvtt.sh`. This command will execute the `main_task_retrieval.py`. But in "main_task_retrieval.py", I find that...

While attempting to set up and run the demo notebook from the repository, I encountered multiple issues related to environment setup, package dependencies, and code configurations that significantly hindered progress....

直接跑demo/demo.ipynb, 模型选用https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4/blob/main/InternVideo2-stage2_1b-224p-f4.pt 发现效果不太理想。 首先需要修改两个地方才能正确加载模型: 1、demo/demo.ipynb 中在setup_internvideo2(config)前面加上一句 config['pretrained_path'] = model_pth 2、demo/utils.py 第82和84行改成is_pretrain=True 修改后demo中提供的视频和10个句子的相似度分数(不经过softmax)为: 可以发现分数最高者并不是正确的描述,同时十个句子得分都比较接近。

Hello, really appreciate for your great work. [https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md](https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md) I checked that you guys wrote "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by only...

Hi do you have any docker image for s2 inference? for some reason i need to build a docker container for inference or use an available docker image for the...

Hi, thanks for your great work! I'm checking at the new released model internVideo2, it's interesting! I saw demo.ipynb files in multi_modality folder, it can calculate text prob. I'm wondering...

Thank you for your selfless sharing. May I ask when the open source Video Temporal Grounding related test code will be available? Look forward to your reply

As described in the paper, there are 234M clips in the InternVid dataset in all, but the size of the largest subset which is publicly available is only 18M. Do...

thanks your great work, is there any .md training、testing instruction?

In "InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation," I would like to use ViCLIP-B-16 on InternVid-200M. Does this dataset ( or InternVid-FLT) contain videos from Kinetics400, SSV2,...