gptme icon indicating copy to clipboard operation
gptme copied to clipboard

fix: remove system outputs from tool examples in system prompt

Open ErikBjare opened this issue 11 months ago • 1 comments

I noticed even good models like Sonnet hallucinated System outputs when ran in non-streaming mode. (might be fixed by #306)

Not sure if this is actually better, should ideally evaluate with evals.

TODO:

  • [ ] Run tests in non-streaming mode to make sure it's not broken.
  • [ ] Run evals in non-streaming mode.

[!IMPORTANT] Enhance example cleaning by adding strip_system parameter to remove system messages in get_examples and clean_example.

  • Behavior:
    • get_examples in base.py now accepts strip_system parameter to remove system messages from examples.
    • clean_example in __init__.py updated to handle strip_system parameter, filtering out system messages and their content.
  • Functions:
    • get_tool_prompt in base.py calls get_examples with strip_system=True to exclude system messages from prompts.
  • Misc:
    • Minor refactoring in get_examples and clean_example to support new functionality.

This description was created by Ellipsis for 0b1d259cc145b92e92c24c57f7e2fdb17df7abfb. It will automatically update as commits are pushed.

ErikBjare avatar Dec 10 '24 22:12 ErikBjare

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 72.94%. Comparing base (65efc3e) to head (0b1d259).

:white_check_mark: All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #320      +/-   ##
==========================================
+ Coverage   72.49%   72.94%   +0.45%     
==========================================
  Files          67       67              
  Lines        4912     4931      +19     
==========================================
+ Hits         3561     3597      +36     
+ Misses       1351     1334      -17     
Flag Coverage Δ
anthropic/claude-3-haiku-20240307 71.02% <100.00%> (+0.47%) :arrow_up:
openai/gpt-4o-mini 69.51% <100.00%> (+0.11%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Dec 10 '24 22:12 codecov-commenter