gptme misc: playing with rl/verifiers

New Script: Adds gptme_verifiers.py to implement an RL environment for training LLMs with gptme and verifiers .
Classes: GPTMeRLEnv : Manages RL environment, tool execution, and reward calculation. ToolResult , State : Define types for tool execution results and environment state. ToolUseReward , TaskReward : Define reward structures for tool usage and task completion.
Functions: reset() , step() , _execute_tool() , _calculate_tool_reward() , _calculate_task_reward() , _get_observation() , _is_done() : Manage environment lifecycle and reward logic. main() : Demonstrates example usage of the environment.

misc: playing with rl/verifiers

Open ErikBjare opened this issue 8 months ago • 1 comments

Didn't get it to run, was just vibecoding with gptme.

[!IMPORTANT] Adds gptme_verifiers.py to implement an RL environment for training LLMs using gptme and verifiers, with classes for tool execution, state management, and reward calculation.

New Script:

Adds gptme_verifiers.py to implement an RL environment for training LLMs with gptme and verifiers.

Classes:

GPTMeRLEnv: Manages RL environment, tool execution, and reward calculation.

ToolResult, State: Define types for tool execution results and environment state.

ToolUseReward, TaskReward: Define reward structures for tool usage and task completion.

Functions:

reset(), step(), _execute_tool(), _calculate_tool_reward(), _calculate_task_reward(), _get_observation(), _is_done(): Manage environment lifecycle and reward logic.

main(): Demonstrates example usage of the environment.

^{This description was created by}^{for 58d1c2dcd5213a6a21859370e18d1e43f09b5f6f. It will automatically update as commits are pushed.}

Feb 17 '25 22:02 ErikBjare

gptme gptme copied to clipboard

misc: playing with rl/verifiers

gptme
gptme copied to clipboard