InternVideo I spent last 2 hour finally get video retrieval working.... here is the mistake I made! (in case anyone else make the same mistake as me -

Hello,

First, I want to thank the authors for this work. I think I see your poster at ICLR in May. Back then I did not work on anything related to video retrieval. oh well.

So yeah I had some troubles get the jupyter notebook in demo folder working because there are few errors here and there. I then made some adjustments to make the code run but I notice that the outputs are gibberish. In particular, the text embedding seems to be wrong. After some investigation, I found my error:

I firstly got error when I run this line tokenizer = BertTokenizer.from_pretrained(config.model.text_encoder.pretrained, local_files_only=True) I assume it's becaues I don't have the right file, so I remove local_files_only=True . Then I got the error about vocab, caused by me having a higher version for transformer.

I then observe that by changing from models.backbones.bert.tokenization_bert import BertTokenizer to from transformers import BertTokenizer. So I changed it. And apparently this causes the generated text embedding to be incorrect.

downgrade transformers to 4.28.1 fix the error.

Thanks

Sep 16 '24 00:09 zmy1116

I also noticed a lot of people running into the relative imports error, you can fix it by running this code from InternVideo/InternVideo2/multi_modality/:

import sys
import os

sys.path.append(os.getcwd())
import numpy as np
import os
import io
import cv2

import torch

from demo.config import (Config,
                    eval_dict_leaf)

from demo.utils import (retrieve_text,
                  _frame_from_video,
                  setup_internvideo2)

You will still need to edit some imports from demo/utils.py and demo/config.py but other than that, it should work.

Sep 23 '24 15:09 qingy1337

I also noticed a lot of people running into the relative imports error, you can fix it by running this code from InternVideo/InternVideo2/multi_modality/:

import sys import os

sys.path.append(os.getcwd()) import numpy as np import os import io import cv2

import torch

from demo.config import (Config, eval_dict_leaf)

from demo.utils import (retrieve_text, _frame_from_video, setup_internvideo2) You will still need to edit some imports from demo/utils.py and demo/config.py but other than that, it should work.

but still got the error:"ValueError: attempted relative import beyond top-level package"

Apr 25 '25 13:04 JustinhoCHN

I also noticed a lot of people running into the relative imports error, you can fix it by running this code from InternVideo/InternVideo2/multi_modality/: import sys import os sys.path.append(os.getcwd()) import numpy as np import os import io import cv2 import torch from demo.config import (Config, eval_dict_leaf) from demo.utils import (retrieve_text, _frame_from_video, setup_internvideo2) You will still need to edit some imports from demo/utils.py and demo/config.py but other than that, it should work.

but still got the error:"ValueError: attempted relative import beyond top-level package"

Maybe you have already solved the problem. If not, refer to this method.

multi_modality/models/criterions.py

from ..utils.distributed import get_rank, get_world_size from ..utils.easydict import EasyDict

Modify to:

from utils.distributed import get_rank, get_world_size from utils.easydict import EasyDict

Jun 05 '25 08:06 Jing-Fu

I spent last 2 hour finally get video retrieval working.... here is the mistake I made! (in case anyone else make the same mistake as me -_-)