gptme icon indicating copy to clipboard operation
gptme copied to clipboard

gptme often tries to use python instead of ipython to execute python code

Open Miyou opened this issue 11 months ago • 11 comments

I feel like this could be solved by adding some extra prompting around this

Miyou avatar Dec 12 '24 14:12 Miyou

We used to just use ```python and ```bash for python and shell tool, but had to change it since it kept trying/asking to run non-runnable code snippets/examples it showed using the same markup. I later changed it to just ipython/shell respectively, and it works much better, but it sometimes messes it up still. Tried different prompts, but hard to measure what has impact since it's somewhat rare/random.

See https://github.com/ErikBjare/gptme/issues/67 for backstory

ErikBjare avatar Dec 12 '24 19:12 ErikBjare

We used to just use python and bash for python and shell tool, but had to change it since it kept trying/asking to run non-runnable code snippets/examples it showed using the same markup. I later changed it to just ipython/shell respectively, and it works much better, but it sometimes messes it up still. Tried different prompts, but hard to measure what has impact since it's somewhat rare/random.

See #67 for backstory

Perhaps not using ``` for tool execution could be a solution. Maybe use XML tags instead?

Miyou avatar Dec 13 '24 09:12 Miyou

@Miyou you can try that with the --tool-format xml parameter

ErikBjare avatar Dec 13 '24 10:12 ErikBjare

Maybe changing the name of the tool from python -> ipython?

bjsi avatar Dec 17 '24 08:12 bjsi

@bjsi Yeah I think that might help, could eval some small model which struggles and see if it helps.

ErikBjare avatar Dec 18 '24 15:12 ErikBjare

@Miyou Try using latest gptme master with --tool-format tool now that we merged #306

Since it skips the markdown format altogether, it shouldn't suffer from this issue.

ErikBjare avatar Dec 21 '24 15:12 ErikBjare

I haven't seen this in master branch since merging #361, just bumped into it when trying uvx gptme (for a show-off tweet) which uses the latest v0.25.0 version which doesn't have the change.

I should make a new release, will hopefully fix the issue for good.

ErikBjare avatar Jan 10 '25 20:01 ErikBjare

Just released https://github.com/ErikBjare/gptme/releases/tag/v0.26.0

Let me know if it still happens!

ErikBjare avatar Jan 14 '25 13:01 ErikBjare

Just happened to me again :(

Although it had just mentioned the still-not-renamed python.py file:

- gptme/tools/python.py (handle completion signal)

Might still be a lot more rare after the tool rename, should perhaps rename the tool file too.

ErikBjare avatar Jan 14 '25 18:01 ErikBjare

Would it work better by formatting tool calls using xml?

On Tue, 14 Jan 2025 at 18:04, Erik Bjäreholt @.***> wrote:

Just happened to me again :(

— Reply to this email directly, view it on GitHub https://github.com/ErikBjare/gptme/issues/327#issuecomment-2590747306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN3UCA32HUT7J6QKVUFLMNT2KVGURAVCNFSM6AAAAABTP654HOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJQG42DOMZQGY . You are receiving this because you were mentioned.Message ID: @.***>

bjsi avatar Jan 14 '25 18:01 bjsi

@bjsi Possibly, I haven't tested it extensively. I assume different models struggle to different extents as well (but I almost only run Claude 3.6).

ErikBjare avatar Jan 14 '25 18:01 ErikBjare

Status Update (2025-10-17)

Current Situation:

  • PR #361 (merged 2025-01-05) renamed python tool to ipython
  • This improved the situation but didn't fully resolve it
  • Erik reported it still happens occasionally (last report: 2025-01-14)

Potential Solutions:

  1. XML tool format (--tool-format xml) - suggested by @bjsi
  2. Tool call format (--tool-format tool) - mentioned by Erik
  3. Renaming the tool file itself (not just the tool name)

Next Steps: To make progress on this, we need systematic evaluation:

  • Test with different models (Claude 3.6, GPT-4, etc.)
  • Compare behavior with different tool formats (markdown, xml, tool)
  • Measure frequency/impact with each configuration
  • Document which approach works best

The issue remains open as it's still reproducible, though less frequent than before the rename.

TimeToBuildBob avatar Oct 17 '25 12:10 TimeToBuildBob

I never see this anymore with modern models like Claude 3.6, Claude 4.5, etc.

Still probably happens with small local models that are overtrained on ```python

ErikBjare avatar Oct 17 '25 12:10 ErikBjare