gptme icon indicating copy to clipboard operation
gptme copied to clipboard

misc: playing with rl/verifiers

Open ErikBjare opened this issue 9 months ago • 1 comments

Didn't get it to run, was just vibecoding with gptme.


[!IMPORTANT] Adds gptme_verifiers.py to implement an RL environment for training LLMs using gptme and verifiers, with classes for tool execution, state management, and reward calculation.

  • New Script:
    • Adds gptme_verifiers.py to implement an RL environment for training LLMs with gptme and verifiers.
  • Classes:
    • GPTMeRLEnv: Manages RL environment, tool execution, and reward calculation.
    • ToolResult, State: Define types for tool execution results and environment state.
    • ToolUseReward, TaskReward: Define reward structures for tool usage and task completion.
  • Functions:
    • reset(), step(), _execute_tool(), _calculate_tool_reward(), _calculate_task_reward(), _get_observation(), _is_done(): Manage environment lifecycle and reward logic.
    • main(): Demonstrates example usage of the environment.

This description was created by Ellipsis for 58d1c2dcd5213a6a21859370e18d1e43f09b5f6f. It will automatically update as commits are pushed.

ErikBjare avatar Feb 17 '25 22:02 ErikBjare

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 67.21%. Comparing base (8845a1a) to head (58d1c2d).

:white_check_mark: All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #445   +/-   ##
=======================================
  Coverage   67.21%   67.21%           
=======================================
  Files          71       71           
  Lines        6304     6304           
=======================================
  Hits         4237     4237           
  Misses       2067     2067           
Flag Coverage Δ
anthropic/claude-3-haiku-20240307 65.97% <ø> (ø)
openai/gpt-4o-mini 65.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 17 '25 22:02 codecov-commenter