InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

I want to finetune the InternVideo2-Stage2_1B-224p-f4 on activitynet. I adjust the data in data.py. My data is: `available_corpus["anet_ret_val"] = dict( anno_path=".../ActivityNet/anno_downstream/anet_ret_val.json", data_root=".../ActivityNet", media_type="video", is_paragraph_retrieval=True, max_txt_l = 150 ) available_corpus["anet_ret_train"] =...

Thanks for your great work. I use your work for text-video retrieval and tried the internvideo2-s2_1b-224p-f4 from the demo and worked well. You recently released the 6b version. I noticed...

Hi, Thanks for sharing this solid model. In the paper, there is mention of pooling ablations in section X. I believe thats in the appendix. But the arxiv version does...

Hello, Is there any approximate time when the 6b model will be available that had been used to get the rank 1 result on the [MSR-VTT leaderboard](https://paperswithcode.com/sota/zero-shot-video-retrieval-on-msr-vtt)?

Hi Team, Could you please clarify the release timeline for the weights of `InternVideo2 S2-6B` for video-text retrieval ? If they have already been released, could you kindly share the...

Hi, thank you for your great work! I want to use InternVideo2 as backbone. I installed flash-attn 2.7.3 and dropout_layer_norm from source. But now i am struggling with this error...

When will InternVideo2.5 open source the SFT code like InternVL2? Will the training code be open-sourced to facilitate following? 🙏

Thank you for contributing such outstanding work, I would like to ask InternVideo2 support Chinese text search video? What model do I need to replace the VisionEncoder and TextEncoder with?...

在论文3.1中看到了Adaptive Temporal Sampling相关介绍,但是好像没有提到技术上怎么实现的。