openai-cookbook
openai-cookbook copied to clipboard
not clear which encoding to use with gpt-3.5-turbo
I don't see where it says which encoding to use with gpt-3.5-turbo, can you add that explicitly both on the tiktoken and the turbo pages?
Yes, will do. Use cl100k_base as the encoding.
And if you use tiktoken to count tokens for ChatGPT API calls, for now you can add 4 to the lengths of the content and name fields, per message.
Yes, will do. Use
cl100k_baseas the encoding.And if you use
tiktokento count tokens for ChatGPT API calls, for now you can add 4 to the lengths of the content and name fields, per message.
Is there an example? What I am using here is wrong
import tiktoken
def num_tokens_from_string(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
print(num_tokens_from_string("""{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}""", "cl100k_base"))
We will have an update in the docs soon to make the counting more accurate.
In the meantime, you can use:
import tiktoken
def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0301"):
"""Returns the number of tokens used by a list of messages."""
try:
encoding = tiktoken.encoding_for_model(model)
except KeyError:
encoding = tiktoken.get_encoding("cl100k_base")
if model == "gpt-3.5-turbo-0301":
num_tokens = 0
for message in messages:
num_tokens += 4 # every message follows <im_start>{role/name}\n{content}<im_end>\n
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name": # if there's a name, the role is omitted
num_tokens += -1 # role is always required and always 1 token
num_tokens += 2 # every reply is primed with <im_start>assistant
return num_tokens
else:
raise NotImplementedError("""num_tokens_from_messages() is not implemented for this model.
See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.""")
Updated here: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb