LLaMA-VID icon indicating copy to clipboard operation
LLaMA-VID copied to clipboard

Multi-image inference

Open g-h-chen opened this issue 11 months ago • 1 comments

Thanks for your great work! LLaMA-VID supports single-image input and video input, but does it support multi-image input? What's the quickest way to adapt to this input?

Thanks in advance!

g-h-chen avatar Mar 07 '24 08:03 g-h-chen

In current version, we do not support multi-image input. But you can support it by using multi-image instruction data like MIMIC-IT for instruction tuning.

yanwei-li avatar Apr 01 '24 04:04 yanwei-li