gptme
gptme copied to clipboard
misc: playing with rl/verifiers
Didn't get it to run, was just vibecoding with gptme.
[!IMPORTANT] Adds
gptme_verifiers.pyto implement an RL environment for training LLMs usinggptmeandverifiers, with classes for tool execution, state management, and reward calculation.
- New Script:
- Adds
gptme_verifiers.pyto implement an RL environment for training LLMs withgptmeandverifiers.- Classes:
GPTMeRLEnv: Manages RL environment, tool execution, and reward calculation.ToolResult,State: Define types for tool execution results and environment state.ToolUseReward,TaskReward: Define reward structures for tool usage and task completion.- Functions:
reset(),step(),_execute_tool(),_calculate_tool_reward(),_calculate_task_reward(),_get_observation(),_is_done(): Manage environment lifecycle and reward logic.main(): Demonstrates example usage of the environment.This description was created by
for 58d1c2dcd5213a6a21859370e18d1e43f09b5f6f. It will automatically update as commits are pushed.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 67.21%. Comparing base (
8845a1a) to head (58d1c2d).
:white_check_mark: All tests successful. No failed tests found.
Additional details and impacted files
@@ Coverage Diff @@
## master #445 +/- ##
=======================================
Coverage 67.21% 67.21%
=======================================
Files 71 71
Lines 6304 6304
=======================================
Hits 4237 4237
Misses 2067 2067
| Flag | Coverage Δ | |
|---|---|---|
| anthropic/claude-3-haiku-20240307 | 65.97% <ø> (ø) |
|
| openai/gpt-4o-mini | 65.41% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.