LLaMA-VID Multi-image inference

Multi-image inference

Open g-h-chen opened this issue 11 months ago • 1 comments

Thanks for your great work! LLaMA-VID supports single-image input and video input, but does it support multi-image input? What's the quickest way to adapt to this input?

Thanks in advance!

Mar 07 '24 08:03 g-h-chen

In current version, we do not support multi-image input. But you can support it by using multi-image instruction data like MIMIC-IT for instruction tuning.

Apr 01 '24 04:04 yanwei-li

LLaMA-VID LLaMA-VID copied to clipboard

Multi-image inference

LLaMA-VID
LLaMA-VID copied to clipboard