gptme
gptme copied to clipboard
misc: playing with rl/verifiers
Didn't get it to run, was just vibecoding with gptme.
[!IMPORTANT] Adds
gptme_verifiers.pyto implement an RL environment for training LLMs usinggptmeandverifiers, with classes for tool execution, state management, and reward calculation.
- New Script:
- Adds
gptme_verifiers.pyto implement an RL environment for training LLMs withgptmeandverifiers.- Classes:
GPTMeRLEnv: Manages RL environment, tool execution, and reward calculation.ToolResult,State: Define types for tool execution results and environment state.ToolUseReward,TaskReward: Define reward structures for tool usage and task completion.- Functions:
reset(),step(),_execute_tool(),_calculate_tool_reward(),_calculate_task_reward(),_get_observation(),_is_done(): Manage environment lifecycle and reward logic.main(): Demonstrates example usage of the environment.This description was created by
for 58d1c2dcd5213a6a21859370e18d1e43f09b5f6f. It will automatically update as commits are pushed.