LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Inconsistent sample numbers in LLaVA-NeXT dataset
Thanks for your great works. I'm downloading LLaVA-NeXT instruction tuning data through lmms-lab/LLaVA-NeXT-Data. However, I find that there are around 779k samples in parquet directory and only 738k samples in llava_next_raw_format/llava_next_raw_format_processed.json. Could you please explain the differences between them.