LLaMA-VID
LLaMA-VID copied to clipboard
A question about the image token.
Hi, authors, I would like to ask if the image token is inserted into every question during multi-turn dialogue training. What is the purpose of doing this? Is it to improve performance?