openai-cookbook num_tokens_from_messages off by 1 token count

num_tokens_from_messages off by 1 token count

Open jwchang0206 opened this issue 2 years ago • 3 comments

trafficstars

Today I found that the calculation was wrong when I was calculating the token size for gpt-3.5-turbo

Mar 18 '23 07:03 jwchang0206

@ted-at-openai

Mar 19 '23 19:03 jwchang0206

Can you give an example? The counts match in the example in the notebook.

Mar 20 '23 22:03 ted-at-openai

const numberOfTokensFromTurboMessages = (
  messages: Array<ChatCompletionRequestMessage>,
) => {
  // Extend existing encoding with custom special tokens
  const encoding = encoding_for_model('gpt-3.5-turbo');
  let numTokens = 0;
  messages.forEach((message) => {
    numTokens += 4; // every message follows <im_start>{role/name}\n{content}<im_end>\n
    Object.keys(message).forEach((key) => {
      const value = message[
        key as keyof ChatCompletionRequestMessage
      ] as string;
      numTokens += encoding.encode(value).length;
      if (key === 'name') {
        // if there's a name, the role is omitted
        numTokens += -1; // role is always required and always 1 token
      }
    });
  });
  numTokens += 3; // every reply is primed with <im_start>assistant
  encoding.free();
  return numTokens;
};

const tokensLength = numberOfTokensFromTurboMessages(messages);
const modelMaxTokens = 4096 - tokensLength;
await openai.createChatCompletion({
      model: 'gpt-3.5-turbo',
      messages, // messages: Array<ChatCompletionRequestMessage>,
      max_tokens: maxTokens,
     ...
});

@ted-at-openai I simplified the code that I'm using and it used to give off by 1 error. After the change, it works now.

Mar 21 '23 11:03 jwchang0206

https://github.com/openai/openai-cookbook/pull/254

Mar 22 '23 07:03 jwchang0206

FYI, we just changed how we count tokens on our backend. Total token consumption is the same, but we're now counting one more token as part of the prompt and one less for the completion. I'll update the code. Be aware that this may change again in the future.

Mar 24 '23 23:03 ted-at-openai

Fixed in #278

Mar 25 '23 04:03 ted-at-openai

openai-cookbook openai-cookbook copied to clipboard

num_tokens_from_messages off by 1 token count

openai-cookbook
openai-cookbook copied to clipboard