LLaVA-NeXT
LLaVA-NeXT copied to clipboard
How to access video data in LLaVA-OneVision?
Thank you for your contribution.
Under the huggingface lmms-lab/LLaVA-OneVision-Data repo, I find that there are only single-image data, and in your scripts/train/README.md, you say that the video incorporates Youcook2 (32267), Charades (19851), NextQA (7653), activitynet (5153), ego4d (671), but under huggingface lmms-lab repo, I cannot find ego4d dataset, and Youcook2 only has val and test split, which is less than the number reported in the paper(41.9k samples).
Does anyone know how to find those video data annotated in the llava format?
I am also a little confused with the relationship between OneVision data and M4-Instruct-Data. Does OneVision contain the total of M4-Instruct?
I assume that the 560k multi-image samples are a subset of M4-Instruct? But M4-Instruct contains 615k multi-image samples, how can I find the 560k subset? And
lmms-lab/LLaVA-OneVision-Data repo actually contains "Single-Image 3.2M", not the "OneVision 1.6M"? So how can I find the "800K higher-quality data re-sampled from previous stage"?
Hi xsgldhy, have you solved this problem?