LLaVA-NeXT icon indicating copy to clipboard operation
LLaVA-NeXT copied to clipboard

Inconsistent sample numbers in LLaVA-NeXT dataset

Open niiickZ opened this issue 1 year ago • 0 comments

Thanks for your great works. I'm downloading LLaVA-NeXT instruction tuning data through lmms-lab/LLaVA-NeXT-Data. However, I find that there are around 779k samples in parquet directory and only 738k samples in llava_next_raw_format/llava_next_raw_format_processed.json. Could you please explain the differences between them.

niiickZ avatar Sep 24 '24 03:09 niiickZ