LLaVA
LLaVA copied to clipboard
Finetuning vision encoder part
feature
Hi, I wonder in the current code if it is possible to finetune both vision encoder part and the projector? Thanks.
I found in the loader file that we have self.vision_tower.requires_grad_(False). Should I just comment this out?
Have you tried to fine-tune vision model?
I am also kind of stuck on how to properly fine-tune the CLIP vision encoder, or even subbing it out with something else. Are you still working on this task? Could you please share some updates?