gptme misc: playing with rl/verifiers

misc: playing with rl/verifiers

Open ErikBjare opened this issue 9 months ago • 1 comments

Didn't get it to run, was just vibecoding with gptme.

[!IMPORTANT] Adds gptme_verifiers.py to implement an RL environment for training LLMs using gptme and verifiers, with classes for tool execution, state management, and reward calculation.

New Script:

Adds gptme_verifiers.py to implement an RL environment for training LLMs with gptme and verifiers.

Classes:

GPTMeRLEnv: Manages RL environment, tool execution, and reward calculation.

ToolResult, State: Define types for tool execution results and environment state.

ToolUseReward, TaskReward: Define reward structures for tool usage and task completion.

Functions:

reset(), step(), _execute_tool(), _calculate_tool_reward(), _calculate_task_reward(), _get_observation(), _is_done(): Manage environment lifecycle and reward logic.

main(): Demonstrates example usage of the environment.

^{This description was created by}^{for 58d1c2dcd5213a6a21859370e18d1e43f09b5f6f. It will automatically update as commits are pushed.}

Feb 17 '25 22:02 ErikBjare

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 67.21%. Comparing base (8845a1a) to head (58d1c2d).

:white_check_mark: All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #445   +/-   ##
=======================================
  Coverage   67.21%   67.21%           
=======================================
  Files          71       71           
  Lines        6304     6304           
=======================================
  Hits         4237     4237           
  Misses       2067     2067

Flag	Coverage Δ
anthropic/claude-3-haiku-20240307	`65.97% <ø> (ø)`
openai/gpt-4o-mini	`65.41% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Feb 17 '25 22:02 codecov-commenter

gptme gptme copied to clipboard

misc: playing with rl/verifiers

Codecov Report

gptme
gptme copied to clipboard