ai
ai copied to clipboard
How to track token usage
The non-streaming chat/completions API from OpenAI has a usage
object in the response:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
Which is great when you have different users and want to put limits depending on usage.
Any recommended way to do the same with the Vercel AI SDK?
One potential option might be to track prompt tokens by using a tokenizer library before starting the stream or in the onStart callback.
For the completion tokens, the streaming APIs return one token at a time, so you can track this in the onToken callback.
From a quick search, dqbd/tiktoken seems to support Vercel's Edge Runtime.
Great feedback. We want to add this to the playground too.
I use https://github.com/dqbd/tiktoken for our production application. I've noticed that it gets very slow as the number of tokens you are counting goes up, so if you try and count in the onCompletion
you might find the same result. If you were to use it in the onToken
callback as suggested it could be fast enough.
We built this for this exact use case with David’s tiktokenizer package and David’s help. https://tiktokenizer.vercel.app/
Apparently, OpenAI already has a feature for this, but it's disabled.
From https://community.openai.com/t/usage-info-in-api-responses/18862/3 :
The feature wasn’t enabled in streaming by default because we found that it could breaking existing integrations. It does exist though! If you would like it turned on, send us a message at help.openai.com
Maybe someone can convince them to enable it with a flag or something.
We've done a bit of research here and every tokenizer is too large (generally due to wasm) for us to include by default with the SDK.
Our recommendation going forward will be to use your tokenizer of choice paired with the onToken
/ onCompletion
callbacks. Will this work sufficiently? If so I'll add it to the docs.
We've done a bit of research here and every tokenizer is too large (generally due to wasm) for us to include by default with the SDK.
Makes sense.
To be honest, my hope was that Vercel could convince OpenAI to add the usage
field to their response, especially since it's something that apparently they already have but it's disabled.
maybe not the right place to ask, but how can we access the stop_reason when streaming with the new v4 sdks?