Get token usage for a query
How do I track the token usage of a query ran with BAML?
from baml_client.sync_client import b
from baml_client.types import Resume
def example(raw_resume: str) -> Resume:
# BAML's internal parser guarantees ExtractResume
# to be always return a Resume type
response = b.ExtractResume(raw_resume)
return response
In this snippet, the Resume object returned by b.ExtractResume(raw_resume) doesn't include prompt tokens & completion tokens used to fulfil the query.
Hi @Sidd065 since baml offers things like Fallbacks and retry_policies, it can be a bit tricky to output token usage out of a function (since it could have made multiple calls).
Could you share what the use case for you is for this data? Is it primarily just to know the token usage? Or do you want to make software decisions based on it later on?
We are working on an interface that will provide the raw http response which will include things like the token usage if the model provider includes it.
meanwhile we do have an observability that does capture some of that metadata as well: https://docs.boundaryml.com/docs/observability/tracing-tagging
If you'd like to get set up with that, please reach out at [email protected]
I decided to check out baml after seeing this blog post. https://www.boundaryml.com/blog/sota-function-calling
I want to compare performance and token usage for some of my own prompts which are currently run with openai/langchain.
When I didn't find a way to track token usage in the documentation, I thought I was missing something as most LLM libraries have ways to track token usage.
Ah got it, yes! we're working on this and hope to have a way of doing this pretty soon:
from baml_client.sync_client import b
from baml_client.types import Resume
def example(raw_resume: str) -> Resume:
# BAML's internal parser guarantees ExtractResume
# to be always return a Resume type
response = b.raw.ExtractResume(raw_resume)
response.token_usage.input
response.token_usage.output
response.token_usage.token
return response
But we're expecting this to land roughly by mid next week!
As far as how you can do this today, if you port one of your prompts over to BAML, you should be able to see the token usage in the playground!
+1 for getting the token count. We are adding usage based pricing where we need to know the number of tokens used to pass it to the metered pricing for Stripe.
Ah got it, yes! we're working on this and hope to have a way of doing this pretty soon:
from baml_client.sync_client import b from baml_client.types import Resume def example(raw_resume: str) -> Resume: # BAML's internal parser guarantees ExtractResume # to be always return a Resume type response = b.raw.ExtractResume(raw_resume) response.token_usage.input response.token_usage.output response.token_usage.token return responseBut we're expecting this to land roughly by mid next week!
Hey @hellovai, as of v0.55.3 this isn't available on the canary branch, is there any alternate branch where this is implemented? The lack of this feature is the only thing stopping me from switching to BAML.
Additional requests:
- https://discord.com/channels/1119368998161752075/1253172394345107466/1283256668393701419
- https://discord.com/channels/1119368998161752075/1274836321948663939/1275194199582572627
My current workaround plan is to have the application internally proxy requests from BAML providers to OpenAI/Anthropic/OpenRouter both so I can get the full request/response bodies for my own o11y stack but also for getting token usages in the response - but having this built in would be super lovely!
we're finally working on this btw!!
We'd love some preliminary reviews on here if you all have opinions prior to us implementing this. Sorry its taken us so long, but we've got a design we hope you like and will offer lots of flexibility!
https://github.com/orgs/BoundaryML/discussions/1289
@lukeramsden @Sidd065 @polzounov
This has been resolved with https://docs.boundaryml.com/guide/baml-advanced/collector-track-tokens