LLaMA-VID
LLaMA-VID copied to clipboard
Multi-image inference
Thanks for your great work! LLaMA-VID supports single-image input and video input, but does it support multi-image input? What's the quickest way to adapt to this input?
Thanks in advance!
In current version, we do not support multi-image input. But you can support it by using multi-image instruction data like MIMIC-IT for instruction tuning.