Configs and updates to gaia eval
- Use tool calling
- Eval Apriel model
- Added readme
Description by Korbit AI
What change is being made?
Update Gaia evaluation configs and flow to use Apriel 1.5 LLM, switch to function-call style guidance, extend max_turns settings, and wire LLM endpoint configuration via environment variable; rename action class, adjust Gaia Gym/Benchmark wiring, and add supporting experiment/docs.
Why are these changes being made?
Switch to an Apriel-based evaluation setup with function-call style guidance to improve tool use and reliability; expose LLM base URL via environment for flexibility; extend max_turns to better control evaluation length and align env creation with new config fields; and update docs to guide setup.
Is this description stale? Ask me to generate a new description by commenting
/korbit-generate-pr-description
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.
Your admin can change your review schedule in the Korbit Console