guidance
guidance copied to clipboard
Tracking Token Usage
Is your feature request related to a problem? Please describe. I need to keep track of my various client's token usage. The OpenAI output has completetion_tokens in the following format: “prompt_tokens”: 123, “completion_tokens”: 55, “total_tokens”: 178
And it would be super cool if those could be in the output variables.
Describe the solution you'd like program()["completion_tokens"] = 81 etc.
Describe alternatives you've considered Tokenizing the input and output. Could not find a way to get the raw input and output going to OpenAI to count the tokens.
Additional context
any updates?
I need this as well.
Great suggestion! Out of curiosity, if guidance needs to make multiple API calls across a program, would you prefer a granular per-call breakdown or just a simple aggregation of all the calls?
I would prefer a per-call breakdown since I could always aggregate it later if I needed to. Would asking for both be feasible?
I would like to have this feature as well. I tried to see if I could add it myself, but to my surprise in chat complete streaming mode, when I print the results of the openai response, the 'usage' object is missing.
So then I tried a second approach (kind of bad): calculating the tokens using the prompt and final response (probably not what everyone wants, but would work for me) but I was unable to figure out how to return the data in streaming.
Any hints on how to do this?
+1, would love to also be able to get the prompt_tokens
prior to executing, so that we can detect how many tokens the completion could have (for error handling)
100% agree that this would be super useful. As a real-world example, I have written a token wallet that allocates and keeps track of token usage per user. It would be incredibly useful if I had a callback that is triggered before and after each new LLM call. This way I could withdraw the tokens from the token wallet at each step and bail the execution early if the wallet is empty.
Bump. This would be very useful to have.
+1, would be mega-useful!
In the meantime, I've just modified the "OpenAIResponse" class in "openai_response.py" by adding print(data['usage']) at the end 😅
+1 here too, i'm trying to track latency and token counts to compare different prompts guidance programs
Has there been any progress?
I skimmed pull requests and found nothing.