Use the llava-onevision weights as starting point for finetuning on custom dataset
Hello, thanks a lot for sharing your training code! In the training script (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_onevision.sh), is it possible to use the trained Llava-weights as a baseline to train on a custom dataset? Currently, we can only mention the vision encoder (SigLip), the LLM (Qwen-Instruct) and the pre-trained adapter. Could we extract these components from the trained llava-onevision weights? This way, we could leverage the instruction-tuned abilities of the model and continue fine-tuning on new instructions! I remember it was possible with llava 1.5, I wonder how we can do it with this new training script.
I've the same problem.
Did anybody figure it out?