LLaVA-NeXT
LLaVA-NeXT copied to clipboard
Any reason for also training vision encoder?
It seems that the policy for training llava has changed since llava-next. While before it was the tradition to only finetune the connector and the LLM during instruction tuning, now the vision encoder is also trained. Any reason why? Does it increase performance? Or could that hinder it?