InternVL
InternVL copied to clipboard
[Feature] GRPO to fine tune InternVL2.5
Motivation
Would it be possible to add a GRPO fine tuning stage to InternVL (2.5) ? I believe it would be great to teach InternVL how to reason without specifying the rationales in a SFT-way but letting it discover it through RL.
Related resources
No response
Additional context
No response