Yi Wang

Results 72 comments of Yi Wang

It **does not** contain videos from your mentioned datasets. We clearified it in Sec. 3.1 data curation as follows:"We ensure the uniqueness of our dataset by creating a database of...

你可以参考我们huggingface的实现来使用,更方便:[internvideo2-chat-hd](https://huggingface.co/OpenGVLab/InternVideo2_chat_8B_HD), [internvideo2-chat](https://huggingface.co/OpenGVLab/InternVideo2-Chat-8B).

你可以自己调一调测试的prompt等。一些输入帧数或者分辨率估计他们没太调整,但prompt啥的可能有调整。目前bench其实都是权宜之计,它们性能不是很一致,prompt调一调可能性能就有几个点的波动

stage2做主要做对比学习的,没有caption的loss,除非学习coca再接个text decoder head,不然没法生成video cap。生成video cap建议还是用stage 3或者我们videochat-flash或者internvideo2.5的工作,用mllm去做

You can consider fine-tuning the stage 1 model in combination with videoMAEv2's decoder. These components closely resemble autoencoders and have the potential to predict frames. However, it's important to assess...

We have discovered that someone has shared the resized edition of the InternVid subset (InternVid-10M-FLT) at [this link](https://opendatalab.org.cn/vd-foundation/InternVid-10M-FLT). It could be beneficial for you.

Could you be more specific? So we can update your mentioned addresses accordingly.

0.5–2 fps works well in most cases.

Apologies for the incorrect citation, and thank you for bringing it to our attention. We will promptly correct the error in the paper on arXiv.

Apologies for the delayed response. Please refer to [this link](https://huggingface.co/datasets/OpenGVLab/InternVid/viewer/InternVid-18M-AES).