InternVideo
InternVideo copied to clipboard
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
作者您好,我看到您的论文里有用到Beats来编码音频然后做模态对齐,但我找不到相应的代码
想要在自有中文检索数据集上应用internvideo2,看了下最接近的是vatex_cn中使用internvideo2_clip模型,但是在加载过程中遇到些疑问。 1. 共计需要加载哪些ckpt? 目前看是这几个组件 chinese_alpaca_lora_7b InternVideo2-stage2_1b-224p-f4.pt 1B_clip.pth internvl_c_13b_224px.pth 配置文件如下 tokenizer_path="chinese_alpaca_lora_7b", vision_ckpt_path="OpenGVLab__InternVideo2-Stage2_1B-224p-f4/InternVideo2-stage2_1b-224p-f4.pt", load_vision_ckpt_from_internvideo2_stage2=True, text_ckpt_path="internvl_c_13b_224px.pth" extra_ckpt_path="OpenGVLab__InternVideo2-CLIP-1B-224p-f8/1B_clip.pth" 但是找不到可以配置1B_clip.pth的地方,在demo中直接在config中设置pretrained_path还是在配置文件中设置extra_ckpt_path? 2.加载ckpt的代码不知道是不是有问题,我最终产出的排序是这样的,不太符合预期。目前使用的是在配置文件中设置extra_ckpt_path,也就是1里的配置 
Hi interVideo Team, I try to run the pretrain and use 1B-41 as pretrained model but I meet this following error, how can I solve this?
Can you help me? I installed flash-attn 2.6.3 and cd csrc/layer_norm && pip install . in order to use InternVideo2, but when I in: from flash_attn.ops.rms_norm import DropoutAddRMSNorm I meet...
First of all, really amazing work, congratulations! And I have some issue when I am trying to fine tune on MSRVTT dataset. If I understand correctly, I could fine tune...
Hello, In the [paper](https://arxiv.org/pdf/2403.15377), you say "ActionFormer [Anne Hendricks et al., 2017] is used as the detection head" and then give Hendricks et al.'s paper as reference. But Hendricks et...
Hi, Thanks for the nice work. How can we finetune the state2 model to generate the reported results in table 10?
PYTHONPATH: :/opt/tiger/arnold/arnold_entrypoint:/usr/bin/srun:/opt/tiger/arnold_toolbox:/opt/tiger/api_common:/opt/tiger/load:/opt/tiger/studio_loader:/opt/tiger/arnold_toolbox:/opt/tiger/rh2:/opt/tiger/rh2:/opt/tiger/pyutil:/python:/python/lib/py4j-0.10.9-src.zip:/opt/tiger/rh2:/opt/tiger/load:/opt/tiger/arnold_toolbox:/opt/tiger/api_common:/opt/tiger/pyutil:/opt/tiger/arnold/arnold_entrypoint:/opt/tiger/studio_loader which python: /usr/bin/python PYTHONPATH: :/opt/tiger/arnold/arnold_entrypoint:/usr/bin/srun:/opt/tiger/arnold_toolbox:/opt/tiger/api_common:/opt/tiger/load:/opt/tiger/studio_loader:/opt/tiger/arnold_toolbox:/opt/tiger/rh2:/opt/tiger/rh2:/opt/tiger/pyutil:/python:/python/lib/py4j-0.10.9-src.zip:/opt/tiger/rh2:/opt/tiger/load:/opt/tiger/arnold_toolbox:/opt/tiger/api_common:/opt/tiger/pyutil:/opt/tiger/arnold/arnold_entrypoint:/opt/tiger/studio_loader:/usr/bin/python:. srun: error: resolve_ctls_from_dns_srv: res_nsearch error: Unknown host srun: error: fetch_config: DNS SRV lookup failed srun: error: _establish_config_source: failed to fetch config srun: fatal:...
> setting model.eval() in demo code > setting correct model path in internvideo2_stage2_config.py in two places: model.vision_encoder.pretrained and pretrained_path > I have made the changes based on the advice and...
微信群满员了,加不上