InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Feature] GRPO to fine tune InternVL2.5

Open paulpacaud opened this issue 9 months ago • 0 comments

Motivation

Would it be possible to add a GRPO fine tuning stage to InternVL (2.5) ? I believe it would be great to teach InternVL how to reason without specifying the rationales in a SFT-way but letting it discover it through RL.

Related resources

No response

Additional context

No response

paulpacaud avatar Mar 05 '25 08:03 paulpacaud