InternVideo issues

InternVideo2.5行为时空定位代码有没有案例代码？

想其输出某个行为在每一帧的坐标信息，曾经试过用提示词让其输出的坐标，但它回复说不能输出像素坐标值。想知道论文中是如何实现的，有没有大佬能提供一些参考的代码或者思路？谢谢

msrvtt_1k_test datasetset anno list from where

3

你好，我按照指引地址下载了MSRVTT,里面的test_list有很多，我想i请问用的是哪一个？我下载的MSRVTT解压后文件目录如下： annotation high-quality structured-symlinks videos 请问test_1k是哪个文件夹下的哪个文件？是MSRVTT/structured-symlinks/val_list_jsfusion.txt么

mazhengyu8282

finetune stage2 of Internvideo2 with num_frames 12 error

When I try to finetune stage2 of Internvideo2 with num_frames 12, I meet the error below: ```python [rank0]: File "/root/nginx/multi_modality/tasks/shared_utils.py", line 192, in setup_model [rank0]: msg = model_without_ddp.load_state_dict(state_dict, strict=False) [rank0]:...

Eliza-and-black

CUDA illegal memory access

Hello, I tried running the video text retrieval demo and I'm running into this error: ``` File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint return CheckpointFunction.apply(function, preserve, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/saumya/miniconda3/lib/python3.12/site-packages/torch/autograd/function.py", line...

bhavnasud821

extract multi-modal features using InternVideo2

Hi InternVideo2 team！ Could you please share a code about how you extract the multi-modal features? I'd like to use the models to extract feature of my own dataset. Thanks...

xeroqin

Extracting Model Embeddings from Existing Code

I would like to use InternVideo2.5 to extract video embeddings. Could you provide a reference script for extracting embeddings, specifically the `hidden_states[-1]` from the LLM's `hidden_states`? Thank you!

xinyanghuang7

InternVideo2.5 Temporal Modeling

Thank you for this video model! I had one question. Is all the temporal modeling in InternVideo2.5 offloaded to the LLM? This is what it appears from the demo provided...

arushirai1

NotImplementedError: We need json file!!!

Hi, When I try to run sh eval_msrvtt.sh, I am getting the following error: ------------------------------------------------------ [rank0]: File "/workspace/InternVideo2/multi_modality/tasks/pretrain.py", line 315, in main [rank0]: train_loaders, test_name2loaders, train_media_types = setup_dataloaders( [rank0]: File...

Shehz

Is there any other link for InternVideo ActivityNet temporal action localization features?

1

Hi 👋 Thank you for your great work! I'd love to reproduce your results for my future research, but I'm having trouble downloading the VideoMAE feature from the Baidu link...

Yuuraa

Could you please provide more examples to do inference on the different tasks in the paper?

4

Such as temporal grounding on QVHighlight and Charade-STA

buaalyx

InternVideo
InternVideo copied to clipboard

Metadata

InternVideo2.5行为时空定位代码有没有案例代码？

msrvtt_1k_test datasetset anno list from where

finetune stage2 of Internvideo2 with num_frames 12 error

CUDA illegal memory access

extract multi-modal features using InternVideo2

Extracting Model Embeddings from Existing Code

InternVideo2.5 Temporal Modeling

NotImplementedError: We need json file!!!

Is there any other link for InternVideo ActivityNet temporal action localization features?

Could you please provide more examples to do inference on the different tasks in the paper?

← Metadata

Owner

Metadata

InternVideo InternVideo copied to clipboard

Metadata

← Metadata

Owner

Metadata

InternVideo
InternVideo copied to clipboard