open-interpreter High LLM Costs

Describe the bug

The script should show pricing or warn the user about cost being incurred. OpenAI's billing limits don't work so it is easy to exceed your budget.

Reproduce

Run with GPT4
Get REKT

Expected behavior

Warning about the number of tokens consumed, or ask for a budget

Screenshots

No response

Open Interpreter version

0.1.7

Python version

3.11.5

Operating System name and version

Ubuntu 22.04

Additional context

Really, I'm broke now, and it's your fault!

Oct 06 '23 04:10 ubuntuyeah

WTH $50???? I have embedded so many documents and generated so much content but still never crossed more than $30 YTD! What did you run?

Oct 06 '23 17:10 Shubham-Khichi

There are quite a few precautions listed in the readme to letting it run in autopilot. I think this can be closed.

Oct 06 '23 19:10 jacksongoode

It definitely needs to display a warning or show the cost incurred with every turn. I vote to keep it open as a feature request.

I also ran up $20 just in the first 15 minutes, and this after never even reaching $5 a month on pipegpt and others even after daily use for a few months now. I can only presume it sends a lot of context with every try.

And I ran up this bill by just trying to get it to run code - not even on autopilot - code that never executed due to it not being able to interpret the end quotes for some reason.

(The end quotes seem to end up inside the code block, causing errors. Also short outputs never terminate. Not sure how this project got so popular - I presume it works out of the box on OSX but definitely not the case on Linux.)

Because of this I'm taking my Saturday morning to dig into the code - it's nice and portable - I'm just not sure why curses is needed, or if it can be bypassed - I'm sure when I get it, it will make debugging easier, but for now its just a hindrance!

Oct 07 '23 09:10 dagelf

Also concur about openAI billing limits not being honored. I set topups of $5 and a cap at $10 and noticed charges going off my card every few minutes. I shudder to think what would happen to someone who set it up thinking they're safe.

Oct 07 '23 09:10 dagelf

There is a max_budget flag that can be set

interpreter --max_budget 0.01 # < This is in USD!

Oct 08 '23 02:10 MikeBirdTech

If you pull down the latest main branch, you can test out the %tokens magic command that attempts to estimate your token usage and cost based on the current conversation. You can run it at any point and it will give you an estimate of how many tokens will be sent along with the next message you send.

It’s not perfect, but it is an attempt to give more information about token usage and help users understand how many tokens are being used behind the scenes to power the conversation.

Oct 10 '23 00:10 ericrallen

There's also #614 that proposes adding the ability to estimate the token usage of a prompt before you send it via the %tokens magic command, too.

Oct 11 '23 17:10 ericrallen

Hey there, @ubuntuyeah!

Between the max_budget parameter and the %tokens magic command, I think Open Interpreter is starting to address this.

Have you tried any of these features? Do they help resolve your Issue?

Oct 13 '23 13:10 ericrallen

It definitely needs to display a warning or show the cost incurred with every turn. I vote to keep it open as a feature request.

I also ran up $20 just in the first 15 minutes, and this after never even reaching $5 a month on pipegpt and others even after daily use for a few months now. I can only presume it sends a lot of context with every try.

And I ran up this bill by just trying to get it to run code - not even on autopilot - code that never executed due to it not being able to interpret the end quotes for some reason.

(The end quotes seem to end up inside the code block, causing errors. Also short outputs never terminate. Not sure how this project got so popular - I presume it works out of the box on OSX but definitely not the case on Linux.)

Because of this I'm taking my Saturday morning to dig into the code - it's nice and portable - I'm just not sure why curses is needed, or if it can be bypassed - I'm sure when I get it, it will make debugging easier, but for now its just a hindrance!

Looking at the debug, seems to be passing the system prompt many times to the API for whatever reason. Not the normal once with every call. But looks like it sends a couple mid journey calls and each time is passing that text, so if you wrote a long one, it is passing it over and over. I also blew up lots in 1 day because of this app.

Oct 13 '23 22:10 grexzen

Another problem I had is that it likes generating python code with explicit loops, and also generating print statements inside the loop. This racks up a lot of tokens quickly. My attempts at having it not doing this weren't so successful. I ideally it would use bulk operations (vector, map, fold/reduce) as loops are slow in python, and it would just print out the result and not every intermediate step.

Oct 17 '23 08:10 habbler

I had the same issue... $50 USD in a few hours so I researched about how to control it and built a python script to have standardized params that can be related to high tokenization/costs


import subprocess

# Define the options
options = {
    "model": "gpt-4",
    "temperature": "0.1",
    "context_window": "1000",
    "max_tokens": "500",
    "max_budget": "5.00",
    "safe_mode": "ask",
    "config_file": "'/path/to/your/config.yaml'"
}

# Convert options to command-line arguments
cmd_options = []
for key, value in options.items():
    cmd_options.append(f"--{key} {value}")

# Combine into a single command string
cmd = "interpreter " + " ".join(cmd_options)

# Execute the command
subprocess.run(cmd, shell=True)

Oct 18 '23 08:10 moymoussan

@habbler That's an interesting insight that I've also noticed but haven't had a chance to put any brainpower towards.

Do you have some examples of what you've already tried that wasn't helping it?

On the one hand, the explicit code is probably easier for less experienced devs to understand, but on the other hand, I would love to have it use more advanced approaches.

I'm not sure how capable the models are in terms of producing more performant code, though.

Oct 24 '23 20:10 ericrallen

@grexzen I mentioned something similar in a comment in another Issue, but what you see in the debug output is not necessarily what's being sent to OpenAI. You'll want to take a look at the messages array in the LiteLLM debugging output to see what is actually sent.

You may see duplicate prompts if you look at the entire debugging output because the response also includes an input array of messages that were sent to generate the chat completion.

If you have an example where prompts were sent multiple times, please let us know so we can investigate what might have happened.

Oct 24 '23 20:10 ericrallen

max_budget addresses this.

Closing this stale issue. Please create a new issue if the problem is not resolved or explained in the documentation. Thanks!

Mar 18 '24 19:03 MikeBirdTech

open-interpreter open-interpreter copied to clipboard

High LLM Costs

Describe the bug

Reproduce

Expected behavior

Screenshots

Open Interpreter version

Python version

Operating System name and version

Additional context

open-interpreter
open-interpreter copied to clipboard