openai-cookbook icon indicating copy to clipboard operation
openai-cookbook copied to clipboard

num_tokens_from_messages off by 1 token count

Open jwchang0206 opened this issue 2 years ago • 3 comments
trafficstars

Today I found that the calculation was wrong when I was calculating the token size for gpt-3.5-turbo

jwchang0206 avatar Mar 18 '23 07:03 jwchang0206

@ted-at-openai

jwchang0206 avatar Mar 19 '23 19:03 jwchang0206

Can you give an example? The counts match in the example in the notebook.

ted-at-openai avatar Mar 20 '23 22:03 ted-at-openai

const numberOfTokensFromTurboMessages = (
  messages: Array<ChatCompletionRequestMessage>,
) => {
  // Extend existing encoding with custom special tokens
  const encoding = encoding_for_model('gpt-3.5-turbo');
  let numTokens = 0;
  messages.forEach((message) => {
    numTokens += 4; // every message follows <im_start>{role/name}\n{content}<im_end>\n
    Object.keys(message).forEach((key) => {
      const value = message[
        key as keyof ChatCompletionRequestMessage
      ] as string;
      numTokens += encoding.encode(value).length;
      if (key === 'name') {
        // if there's a name, the role is omitted
        numTokens += -1; // role is always required and always 1 token
      }
    });
  });
  numTokens += 3; // every reply is primed with <im_start>assistant
  encoding.free();
  return numTokens;
};

const tokensLength = numberOfTokensFromTurboMessages(messages);
const modelMaxTokens = 4096 - tokensLength;
await openai.createChatCompletion({
      model: 'gpt-3.5-turbo',
      messages, // messages: Array<ChatCompletionRequestMessage>,
      max_tokens: maxTokens,
     ...
});

@ted-at-openai I simplified the code that I'm using and it used to give off by 1 error. After the change, it works now.

jwchang0206 avatar Mar 21 '23 11:03 jwchang0206

https://github.com/openai/openai-cookbook/pull/254

jwchang0206 avatar Mar 22 '23 07:03 jwchang0206

FYI, we just changed how we count tokens on our backend. Total token consumption is the same, but we're now counting one more token as part of the prompt and one less for the completion. I'll update the code. Be aware that this may change again in the future.

ted-at-openai avatar Mar 24 '23 23:03 ted-at-openai

Fixed in #278

ted-at-openai avatar Mar 25 '23 04:03 ted-at-openai