Use the llava-onevision weights as starting point for finetuning on custom dataset

Open NicoZenith opened this issue 1 year ago • 1 comments

Hello, thanks a lot for sharing your training code! In the training script (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/finetune_onevision.sh), is it possible to use the trained Llava-weights as a baseline to train on a custom dataset? Currently, we can only mention the vision encoder (SigLip), the LLM (Qwen-Instruct) and the pre-trained adapter. Could we extract these components from the trained llava-onevision weights? This way, we could leverage the instruction-tuned abilities of the model and continue fine-tuning on new instructions! I remember it was possible with llava 1.5, I wonder how we can do it with this new training script.

Sep 04 '24 14:09 NicoZenith

I've the same problem.

Sep 12 '24 13:09 guanyanchu

Did anybody figure it out?

Dec 15 '25 00:12 Akila-Ayanthi