openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

Function to calculate number of tokens

Open rmilkowski opened this issue 2 years ago • 9 comments

Describe the feature or improvement you're requesting

It would be useful if the module provided a function to calculate number of token in a given prompt for a given model, without having to use another 3rd party modules to do so.

One examle when it would be useful is to trim fed history if the entire prompt (+max_tokens) is above a given model limit, before sending the query.

Additional context

No response

rmilkowski avatar Apr 23 '23 09:04 rmilkowski

Hi @rmilkowski

If you are using the python sdk, tiktoken is the library created by OpenAI itself to tokenise/count the tokens: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

Also, check the following link which describes how to use the token count: https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

Please check if this helps.

lewiswilliam avatar Apr 24 '23 12:04 lewiswilliam

fair enough, although it's a shame they provide an example code how to handle different chat models instead of providing a function to deal with it, so users wouldn't have to keep their code updated and would have pick up new models by simply upgrading the module. I submitted https://github.com/openai/tiktoken/issues/115

rmilkowski avatar Apr 26 '23 10:04 rmilkowski

I agree, even an api call that calculates tokens based on model and text input would be super valuable

StephenHodgson avatar Nov 11 '23 12:11 StephenHodgson

It is at least somewhat surprising that this feature is not available. Tiktoken is a 3rd party tokenizer that tries to be compatible with OpenAI's endpoints -- but a tokenizer is not a token counter. The linked issue above demonstrates that you cannot even use tiktoken to easily calculate tokens from the function calling API (which has now been changed to tools, further compounding the problem). Clearly it should not be the responsibility of a 3rd party application to keep up with OpenAI's API changes, in order to provide the basic functionality that makes it useable.

It's difficult to batch requests in applications when you aren't exactly certain what the size of your batches are.

AdventLee avatar Dec 06 '23 16:12 AdventLee

This is a great solution and it's useful.

https://gist.github.com/CGamesPlay/dd4f108f27e2eec145eedf5c717318f5

Usage:

base_usage = ...  # message tokens + image tokens

tools = [...] # A list of tools 

...

encoder = tiktoken.encoding_for_model("gpt-4-1106-preview")
token_length = lambda x: len(encoder.encode(x))

FUNCTION_OVERHEAD = 12

# `format_tool` see: https://gist.github.com/CGamesPlay/dd4f108f27e2eec145eedf5c717318f5
total_tokens = base_usage + FUNCTION_OVERHEAD + sum(token_length(format_tool(t['function'])) for t in tools)

jussker avatar Dec 15 '23 17:12 jussker

Not only this library should allow us to calculate the number of tokens that we pass and receive, but we should also be able to restrict the number of tokens in some way. I don't see why this feature should not be implemented, except that not implementing it will just make customers pay more but unnecessarily in many cases.

nbro10 avatar Feb 13 '24 16:02 nbro10

Personally, I believe this should actually be done server side via API endpoint.

StephenHodgson avatar Feb 13 '24 16:02 StephenHodgson

There's already a way to limit the costs or used credits in our OpenAI account, but not per request (I think).

nbro10 avatar Feb 13 '24 16:02 nbro10

If OpenAI isn't planning on providing this functionality, could you at least provide some clear informations on how tokens are counted for function calls/tool calls/structured outputs/etc. etc. etc. ? How are we supposed to calculate token usage?

ldorigo avatar Sep 02 '24 08:09 ldorigo