I spent last 2 hour finally get video retrieval working.... here is the mistake I made! (in case anyone else make the same mistake as me -_-)
Hello,
First, I want to thank the authors for this work. I think I see your poster at ICLR in May. Back then I did not work on anything related to video retrieval. oh well.
So yeah I had some troubles get the jupyter notebook in demo folder working because there are few errors here and there.
I then made some adjustments to make the code run but I notice that the outputs are gibberish. In particular, the text embedding seems to be wrong. After some investigation, I found my error:
I firstly got error when I run this line
tokenizer = BertTokenizer.from_pretrained(config.model.text_encoder.pretrained, local_files_only=True)
I assume it's becaues I don't have the right file, so I remove local_files_only=True . Then I got the error about vocab, caused by me having a higher version for transformer.
I then observe that by changing from models.backbones.bert.tokenization_bert import BertTokenizer to from transformers import BertTokenizer. So I changed it. And apparently this causes the generated text embedding to be incorrect.
downgrade transformers to 4.28.1 fix the error.
Thanks
I also noticed a lot of people running into the relative imports error, you can fix it by running this code from InternVideo/InternVideo2/multi_modality/:
import sys
import os
sys.path.append(os.getcwd())
import numpy as np
import os
import io
import cv2
import torch
from demo.config import (Config,
eval_dict_leaf)
from demo.utils import (retrieve_text,
_frame_from_video,
setup_internvideo2)
You will still need to edit some imports from demo/utils.py and demo/config.py but other than that, it should work.
I also noticed a lot of people running into the relative imports error, you can fix it by running this code from
InternVideo/InternVideo2/multi_modality/:import sys import os
sys.path.append(os.getcwd()) import numpy as np import os import io import cv2
import torch
from demo.config import (Config, eval_dict_leaf)
from demo.utils import (retrieve_text, _frame_from_video, setup_internvideo2) You will still need to edit some imports from
demo/utils.pyanddemo/config.pybut other than that, it should work.
but still got the error:"ValueError: attempted relative import beyond top-level package"
I also noticed a lot of people running into the relative imports error, you can fix it by running this code from
InternVideo/InternVideo2/multi_modality/: import sys import os sys.path.append(os.getcwd()) import numpy as np import os import io import cv2 import torch from demo.config import (Config, eval_dict_leaf) from demo.utils import (retrieve_text, _frame_from_video, setup_internvideo2) You will still need to edit some imports fromdemo/utils.pyanddemo/config.pybut other than that, it should work.but still got the error:"ValueError: attempted relative import beyond top-level package"
Maybe you have already solved the problem. If not, refer to this method.
multi_modality/models/criterions.py
from ..utils.distributed import get_rank, get_world_size from ..utils.easydict import EasyDict
Modify to:
from utils.distributed import get_rank, get_world_size from utils.easydict import EasyDict