FEAT add support for reasoning models as scorers
Is your feature request related to a problem? Please describe.
Right now, there is no support for models which output tokens besides the json with the scoring. This is a problem for models like deepseek-r1 which output
Describe the solution you'd like
Proposed in PR #719 Just change from remove_markdown_json which just removes the "```" tokens to getting the content between the "{" and "}" tokens
Describe alternatives you've considered, if relevant
Additional context
Many scorers rely on getting JSON-formatted responses. Removing the curly braces would break that behavior. Isn't the response from a reasoning model also going to be text? In what way is that different? Can you provide an illustrative example?
I'd love to help or at least suggest a way forward, but I suspect I'm missing something critical.
In the PR I had the curly braces again, the code is working. I am using that modified version for my thesis but I know it would require additional testing to get pushed to production. But I need some help creating those tests (I don't have experience on that)
This can handle markdown tags around the JSON, or even thinking tokens or anything else that could be around the JSON