VILA
VILA copied to clipboard
Can I fine-tune NVILA wiht multiple-images?
I read the instructions https://github.com/NVlabs/VILA/tree/main/finetuning but it only shows how fine-tune with single image-QA set. As NVILA can take multiple images as input for inference, would it be possible to fine-tune with multiple images?