ai icon indicating copy to clipboard operation
ai copied to clipboard

How to track token usage

Open pomber opened this issue 1 year ago • 5 comments

The non-streaming chat/completions API from OpenAI has a usage object in the response:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Which is great when you have different users and want to put limits depending on usage.

Any recommended way to do the same with the Vercel AI SDK?

pomber avatar Jun 15 '23 23:06 pomber

One potential option might be to track prompt tokens by using a tokenizer library before starting the stream or in the onStart callback.

For the completion tokens, the streaming APIs return one token at a time, so you can track this in the onToken callback.

From a quick search, dqbd/tiktoken seems to support Vercel's Edge Runtime.

TheBinaryGuy avatar Jun 16 '23 01:06 TheBinaryGuy

Great feedback. We want to add this to the playground too.

jaredpalmer avatar Jun 17 '23 20:06 jaredpalmer

I use https://github.com/dqbd/tiktoken for our production application. I've noticed that it gets very slow as the number of tokens you are counting goes up, so if you try and count in the onCompletion you might find the same result. If you were to use it in the onToken callback as suggested it could be fast enough.

jensen avatar Jun 19 '23 17:06 jensen

We built this for this exact use case with David’s tiktokenizer package and David’s help. https://tiktokenizer.vercel.app/

siddharthsharma94 avatar Jun 20 '23 04:06 siddharthsharma94

Apparently, OpenAI already has a feature for this, but it's disabled.

From https://community.openai.com/t/usage-info-in-api-responses/18862/3 :

The feature wasn’t enabled in streaming by default because we found that it could breaking existing integrations. It does exist though! If you would like it turned on, send us a message at help.openai.com

Maybe someone can convince them to enable it with a flag or something.

pomber avatar Jun 20 '23 09:06 pomber

We've done a bit of research here and every tokenizer is too large (generally due to wasm) for us to include by default with the SDK.

Our recommendation going forward will be to use your tokenizer of choice paired with the onToken / onCompletion callbacks. Will this work sufficiently? If so I'll add it to the docs.

MaxLeiter avatar Aug 07 '23 18:08 MaxLeiter

We've done a bit of research here and every tokenizer is too large (generally due to wasm) for us to include by default with the SDK.

Makes sense.

To be honest, my hope was that Vercel could convince OpenAI to add the usage field to their response, especially since it's something that apparently they already have but it's disabled.

pomber avatar Aug 08 '23 11:08 pomber

maybe not the right place to ask, but how can we access the stop_reason when streaming with the new v4 sdks?

colinricardo avatar Aug 23 '23 17:08 colinricardo