LLaVA-NeXT How to access video data in LLaVA-OneVision?

How to access video data in LLaVA-OneVision?

Open xsgldhy opened this issue 1 year ago • 2 comments

trafficstars

Thank you for your contribution. Under the huggingface lmms-lab/LLaVA-OneVision-Data repo, I find that there are only single-image data, and in your scripts/train/README.md, you say that the video incorporates Youcook2 (32267), Charades (19851), NextQA (7653), activitynet (5153), ego4d (671), but under huggingface lmms-lab repo, I cannot find ego4d dataset, and Youcook2 only has val and test split, which is less than the number reported in the paper(41.9k samples). Does anyone know how to find those video data annotated in the llava format?

Aug 28 '24 09:08 xsgldhy

I am also a little confused with the relationship between OneVision data and M4-Instruct-Data. Does OneVision contain the total of M4-Instruct? I assume that the 560k multi-image samples are a subset of M4-Instruct? But M4-Instruct contains 615k multi-image samples, how can I find the 560k subset? And lmms-lab/LLaVA-OneVision-Data repo actually contains "Single-Image 3.2M", not the "OneVision 1.6M"? So how can I find the "800K higher-quality data re-sampled from previous stage"?

Aug 28 '24 09:08 xsgldhy

Hi xsgldhy, have you solved this problem?

Sep 17 '24 08:09 yongliang-wu

LLaVA-NeXT LLaVA-NeXT copied to clipboard

How to access video data in LLaVA-OneVision?

LLaVA-NeXT
LLaVA-NeXT copied to clipboard