LLaMA-Adapter
LLaMA-Adapter copied to clipboard
Multi-image inputs to the model
Hi, I was wondering if it is possible to prompt the model with more than one image input since in the implementation the incorporation of the visual tokens is a simple addition to the adapter layer tokens (https://huggingface.co/spaces/csuhan/LLaMA-Adapter/blob/48d8b02c0c335145b8b3d1ca7162ac42979bec93/llama/model.py#L357)? Have you tried incorporating multiple image inputs by adding more than one set of visual tokens to the adapters?