InternVideo icon indicating copy to clipboard operation
InternVideo copied to clipboard

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Results 170 InternVideo issues
Sort by recently updated
recently updated
newest added

想其输出某个行为在每一帧的坐标信息,曾经试过用提示词让其输出的坐标,但它回复说不能输出像素坐标值。想知道论文中是如何实现的,有没有大佬能提供一些参考的代码或者思路?谢谢

你好,我按照指引地址下载了MSRVTT,里面的test_list有很多,我想i请问用的是哪一个? 我下载的MSRVTT解压后文件目录如下: annotation high-quality structured-symlinks videos 请问test_1k是哪个文件夹下的哪个文件? 是MSRVTT/structured-symlinks/val_list_jsfusion.txt么

When I try to finetune stage2 of Internvideo2 with num_frames 12, I meet the error below: ```python [rank0]: File "/root/nginx/multi_modality/tasks/shared_utils.py", line 192, in setup_model [rank0]: msg = model_without_ddp.load_state_dict(state_dict, strict=False) [rank0]:...

Hello, I tried running the video text retrieval demo and I'm running into this error: ``` File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint return CheckpointFunction.apply(function, preserve, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/autograd/function.py", line...

Hi InternVideo2 team! Could you please share a code about how you extract the multi-modal features? I'd like to use the models to extract feature of my own dataset. Thanks...

I would like to use InternVideo2.5 to extract video embeddings. Could you provide a reference script for extracting embeddings, specifically the `hidden_states[-1]` from the LLM's `hidden_states`? Thank you!

Thank you for this video model! I had one question. Is all the temporal modeling in InternVideo2.5 offloaded to the LLM? This is what it appears from the demo provided...

Hi, When I try to run sh eval_msrvtt.sh, I am getting the following error: ------------------------------------------------------ [rank0]: File "/workspace/InternVideo2/multi_modality/tasks/pretrain.py", line 315, in main [rank0]: train_loaders, test_name2loaders, train_media_types = setup_dataloaders( [rank0]: File...

Hi 👋 Thank you for your great work! I'd love to reproduce your results for my future research, but I'm having trouble downloading the VideoMAE feature from the Baidu link...