InternVideo issues

Zero-shot retrieval reproduction issue

1

According to the ReadMe at [https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Downstream/Video-Text-Retrieval](url), the zero-shot retrieval results will be obtained after running the command `./zeroshot_scripts/eval_msrvtt.sh`. This command will execute the `main_task_retrieval.py`. But in "main_task_retrieval.py", I find that...

jqsun98

Installation Issues with Demo Notebook

9

While attempting to set up and run the demo notebook from the repository, I encountered multiple issues related to environment setup, package dependencies, and code configurations that significantly hindered progress....

raviy0807

S2 pretrained model of InternVideo2 does not work well for Zero-Shot Video-Text Retrieval

32

直接跑demo/demo.ipynb, 模型选用https://huggingface.co/OpenGVLab/InternVideo2-Stage2_1B-224p-f4/blob/main/InternVideo2-stage2_1b-224p-f4.pt 发现效果不太理想。首先需要修改两个地方才能正确加载模型： 1、demo/demo.ipynb 中在setup_internvideo2(config)前面加上一句 config['pretrained_path'] = model_pth 2、demo/utils.py 第82和84行改成is_pretrain=True 修改后demo中提供的视频和10个句子的相似度分数（不经过softmax）为：可以发现分数最高者并不是正确的描述，同时十个句子得分都比较接近。

Wenju-Huang

Clip model size is too small

4

Hello, really appreciate for your great work. [https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md](https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/MODEL_ZOO.md) I checked that you guys wrote "We also learn a CLIP-style InternVideo2 indicated by InternVideo2clip. It is post-pretrained from InternVideo2s2 by only...

dwsmart32

Docker image

Hi do you have any docker image for s2 inference? for some reason i need to build a docker container for inference or use an available docker image for the...

Hari-Durai-Baskar

is there a demo code for video QA and video Captioning?

5

Hi, thanks for your great work! I'm checking at the new released model internVideo2, it's interesting! I saw demo.ipynb files in multi_modality folder, it can calculate text prob. I'm wondering...

LanHao0

About Video Temporal Grounding

4

Thank you for your selfless sharing. May I ask when the open source Video Temporal Grounding related test code will be available? Look forward to your reply

ArlixLin

Do you have plans to release all the captions of InternVid?

2

As described in the paper, there are 234M clips in the InternVid dataset in all, but the size of the largest subset which is publicly available is only 18M. Do...

yellow-binary-tree

is there any clear training instruction of internvideo1, like step by step?

1

thanks your great work, is there any .md training、testing instruction?

lynshwoo2022

Simple question: What are the public datasets included in InternVid-200M?

1

In "InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation," I would like to use ViCLIP-B-16 on InternVid-200M. Does this dataset ( or InternVid-FLT) contain videos from Kinetics400, SSV2,...

jong980812

InternVideo
InternVideo copied to clipboard

Metadata

Zero-shot retrieval reproduction issue

Installation Issues with Demo Notebook

S2 pretrained model of InternVideo2 does not work well for Zero-Shot Video-Text Retrieval

Clip model size is too small

Docker image

is there a demo code for video QA and video Captioning?

About Video Temporal Grounding

Do you have plans to release all the captions of InternVid?

is there any clear training instruction of internvideo1, like step by step?

Simple question: What are the public datasets included in InternVid-200M?

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard