Yi Wang
Yi Wang
DIV and FLT stand for diverse sampling and filtering respectively. Specifically, for DIV (diversity sampling), we aim to sample video clips from all long videos available to maximize data diversity....
Apologies for the delayed response. 1. You can access the full version of InternVid [here](https://huggingface.co/datasets/OpenGVLab/InternVid-Full). 2. No. The aesthetic dataset does not consider CLIP score. When filtering by aesthetic scores,...
It is legacy issue. We will see what we could do.
Our team is preparing it. However, due to our tight schedule, the precise release date for the full version is currently uncertain.
Apologies for the delayed response. You can access the complete version of InternVid [here](https://huggingface.co/datasets/OpenGVLab/InternVid-Full).
Thank you for your feedback. We will address your mentioned issues soon.
This is caused by the missing installation of some libs given in flash attention. You need to get the source code of flash attention, and then install layer_norm as in...
If your machine did not support the installation of these libs, you could alter the settings in config.py that does not use half precision and bf16 for running. In that...
You can refer to [this instruction](https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/INSTALL.md#key-dependencies-installation-for-flashattention2) to install dependencies to run flash-attn with layernorm and other components. If your hardware does not support flash-attn and its dependenies installation, you can...
Sorry for replying late. Please refer to the [branch](https://github.com/OpenGVLab/InternVideo/tree/grounding_evaluation) of this repo regarding grounding evaluations.