[Feature] GRPO to fine tune InternVL2.5

Open paulpacaud opened this issue 10 months ago • 0 comments

Motivation

Would it be possible to add a GRPO fine tuning stage to InternVL (2.5) ? I believe it would be great to teach InternVL how to reason without specifying the rationales in a SFT-way but letting it discover it through RL.

Related resources

No response

Additional context

No response

Mar 05 '25 08:03 paulpacaud