LLaMA-VID About the json in stage2 and stage3

About the json in stage2 and stage3

Open liziming5353 opened this issue 10 months ago • 1 comments

Why does the data in stage2 and 3 contains pure text Q&A without images or videos?

Apr 02 '24 04:04 liziming5353

According to DeepSeek-VL,

Maintaining a significant proportion of language data—specifically, at least 70%—is essential to preserve the integrity of language knowledge within the model.

Apr 04 '24 07:04 Becomebright

LLaMA-VID LLaMA-VID copied to clipboard

About the json in stage2 and stage3

LLaMA-VID
LLaMA-VID copied to clipboard