open-interpreter
open-interpreter copied to clipboard
High LLM Costs
Describe the bug
The script should show pricing or warn the user about cost being incurred. OpenAI's billing limits don't work so it is easy to exceed your budget.
Reproduce
- Run with GPT4
- Get REKT
Expected behavior
Warning about the number of tokens consumed, or ask for a budget
Screenshots
No response
Open Interpreter version
0.1.7
Python version
3.11.5
Operating System name and version
Ubuntu 22.04
Additional context
Really, I'm broke now, and it's your fault!
WTH $50???? I have embedded so many documents and generated so much content but still never crossed more than $30 YTD! What did you run?
There are quite a few precautions listed in the readme to letting it run in autopilot. I think this can be closed.
It definitely needs to display a warning or show the cost incurred with every turn. I vote to keep it open as a feature request.
I also ran up $20 just in the first 15 minutes, and this after never even reaching $5 a month on pipegpt and others even after daily use for a few months now. I can only presume it sends a lot of context with every try.
And I ran up this bill by just trying to get it to run code - not even on autopilot - code that never executed due to it not being able to interpret the end quotes for some reason.
(The end quotes seem to end up inside the code block, causing errors. Also short outputs never terminate. Not sure how this project got so popular - I presume it works out of the box on OSX but definitely not the case on Linux.)
Because of this I'm taking my Saturday morning to dig into the code - it's nice and portable - I'm just not sure why curses is needed, or if it can be bypassed - I'm sure when I get it, it will make debugging easier, but for now its just a hindrance!
Also concur about openAI billing limits not being honored. I set topups of $5 and a cap at $10 and noticed charges going off my card every few minutes. I shudder to think what would happen to someone who set it up thinking they're safe.
There is a max_budget flag that can be set
interpreter --max_budget 0.01 # < This is in USD!
If you pull down the latest main branch, you can test out the %tokens magic command that attempts to estimate your token usage and cost based on the current conversation. You can run it at any point and it will give you an estimate of how many tokens will be sent along with the next message you send.
It’s not perfect, but it is an attempt to give more information about token usage and help users understand how many tokens are being used behind the scenes to power the conversation.
There's also #614 that proposes adding the ability to estimate the token usage of a prompt before you send it via the %tokens magic command, too.
Hey there, @ubuntuyeah!
Between the max_budget parameter and the %tokens magic command, I think Open Interpreter is starting to address this.
Have you tried any of these features? Do they help resolve your Issue?
It definitely needs to display a warning or show the cost incurred with every turn. I vote to keep it open as a feature request.
I also ran up $20 just in the first 15 minutes, and this after never even reaching $5 a month on
pipegptand others even after daily use for a few months now. I can only presume it sends a lot of context with every try.And I ran up this bill by just trying to get it to run code - not even on autopilot - code that never executed due to it not being able to interpret the end quotes for some reason.
(The end quotes seem to end up inside the code block, causing errors. Also short outputs never terminate. Not sure how this project got so popular - I presume it works out of the box on OSX but definitely not the case on Linux.)
Because of this I'm taking my Saturday morning to dig into the code - it's nice and portable - I'm just not sure why curses is needed, or if it can be bypassed - I'm sure when I get it, it will make debugging easier, but for now its just a hindrance!
Looking at the debug, seems to be passing the system prompt many times to the API for whatever reason. Not the normal once with every call. But looks like it sends a couple mid journey calls and each time is passing that text, so if you wrote a long one, it is passing it over and over. I also blew up lots in 1 day because of this app.
Another problem I had is that it likes generating python code with explicit loops, and also generating print statements inside the loop. This racks up a lot of tokens quickly. My attempts at having it not doing this weren't so successful. I ideally it would use bulk operations (vector, map, fold/reduce) as loops are slow in python, and it would just print out the result and not every intermediate step.
I had the same issue... $50 USD in a few hours so I researched about how to control it and built a python script to have standardized params that can be related to high tokenization/costs
import subprocess
# Define the options
options = {
"model": "gpt-4",
"temperature": "0.1",
"context_window": "1000",
"max_tokens": "500",
"max_budget": "5.00",
"safe_mode": "ask",
"config_file": "'/path/to/your/config.yaml'"
}
# Convert options to command-line arguments
cmd_options = []
for key, value in options.items():
cmd_options.append(f"--{key} {value}")
# Combine into a single command string
cmd = "interpreter " + " ".join(cmd_options)
# Execute the command
subprocess.run(cmd, shell=True)
@habbler That's an interesting insight that I've also noticed but haven't had a chance to put any brainpower towards.
Do you have some examples of what you've already tried that wasn't helping it?
On the one hand, the explicit code is probably easier for less experienced devs to understand, but on the other hand, I would love to have it use more advanced approaches.
I'm not sure how capable the models are in terms of producing more performant code, though.
@grexzen I mentioned something similar in a comment in another Issue, but what you see in the debug output is not necessarily what's being sent to OpenAI. You'll want to take a look at the messages array in the LiteLLM debugging output to see what is actually sent.
You may see duplicate prompts if you look at the entire debugging output because the response also includes an input array of messages that were sent to generate the chat completion.
If you have an example where prompts were sent multiple times, please let us know so we can investigate what might have happened.
max_budget addresses this.
Closing this stale issue. Please create a new issue if the problem is not resolved or explained in the documentation. Thanks!