gptme icon indicating copy to clipboard operation
gptme copied to clipboard

misc: playing with rl/verifiers

Open ErikBjare opened this issue 8 months ago • 1 comments

Didn't get it to run, was just vibecoding with gptme.


[!IMPORTANT] Adds gptme_verifiers.py to implement an RL environment for training LLMs using gptme and verifiers, with classes for tool execution, state management, and reward calculation.

  • New Script:
    • Adds gptme_verifiers.py to implement an RL environment for training LLMs with gptme and verifiers.
  • Classes:
    • GPTMeRLEnv: Manages RL environment, tool execution, and reward calculation.
    • ToolResult, State: Define types for tool execution results and environment state.
    • ToolUseReward, TaskReward: Define reward structures for tool usage and task completion.
  • Functions:
    • reset(), step(), _execute_tool(), _calculate_tool_reward(), _calculate_task_reward(), _get_observation(), _is_done(): Manage environment lifecycle and reward logic.
    • main(): Demonstrates example usage of the environment.

This description was created by Ellipsis for 58d1c2dcd5213a6a21859370e18d1e43f09b5f6f. It will automatically update as commits are pushed.

ErikBjare avatar Feb 17 '25 22:02 ErikBjare