LLaMA-VID icon indicating copy to clipboard operation
LLaMA-VID copied to clipboard

About the json in stage2 and stage3

Open liziming5353 opened this issue 10 months ago • 1 comments

Why does the data in stage2 and 3 contains pure text Q&A without images or videos?

liziming5353 avatar Apr 02 '24 04:04 liziming5353

According to DeepSeek-VL,

Maintaining a significant proportion of language data—specifically, at least 70%—is essential to preserve the integrity of language knowledge within the model.

Becomebright avatar Apr 04 '24 07:04 Becomebright