Finetuning on Video Question answering

Open shahedmomenzadeh opened this issue 4 months ago • 2 comments

HI, is there a guide on how to fine tune the InternVL model series on video question answering dataset?

Aug 26 '25 14:08 shahedmomenzadeh

Thank you for you interest in our work. You can refer to our documentation for more details. We will update the documentation for InternVL3.5 soon. In the meantime, you can refer to the documentation of InternVL3 as well as the launch scripts of GPT-OSS to finetune InternVL3.5. Notably, if you are finetuning the Qwen3-based InternVL3.5, you need to change --conv_style "internvl3_5_gpt_oss" to --conv_style "internvl2_5", while leaving the other hyperparameters unchanged.

Aug 29 '25 13:08 Weiyun1025

I am also interested in getting the updated version of the InternVL3.5 fine-tuning documentation. @Weiyun1025 Would you have any estimation of when the updated documentation will be released, please ?

Sep 07 '25 20:09 paulpacaud