baml Get token usage for a query

How do I track the token usage of a query ran with BAML?

from baml_client.sync_client import b
from baml_client.types import Resume
def example(raw_resume: str) -> Resume: 
  # BAML's internal parser guarantees ExtractResume
  # to be always return a Resume type
  response = b.ExtractResume(raw_resume)
  return response

In this snippet, the Resume object returned by b.ExtractResume(raw_resume) doesn't include prompt tokens & completion tokens used to fulfil the query.

Aug 15 '24 12:08 Sidd065

Hi @Sidd065 since baml offers things like Fallbacks and retry_policies, it can be a bit tricky to output token usage out of a function (since it could have made multiple calls).

Could you share what the use case for you is for this data? Is it primarily just to know the token usage? Or do you want to make software decisions based on it later on?

We are working on an interface that will provide the raw http response which will include things like the token usage if the model provider includes it.

meanwhile we do have an observability that does capture some of that metadata as well: https://docs.boundaryml.com/docs/observability/tracing-tagging

If you'd like to get set up with that, please reach out at [email protected]

Aug 15 '24 13:08 hellovai

I decided to check out baml after seeing this blog post. https://www.boundaryml.com/blog/sota-function-calling

I want to compare performance and token usage for some of my own prompts which are currently run with openai/langchain.

When I didn't find a way to track token usage in the documentation, I thought I was missing something as most LLM libraries have ways to track token usage.

Aug 15 '24 14:08 Sidd065

Ah got it, yes! we're working on this and hope to have a way of doing this pretty soon:

from baml_client.sync_client import b
from baml_client.types import Resume
def example(raw_resume: str) -> Resume: 
  # BAML's internal parser guarantees ExtractResume
  # to be always return a Resume type
  response = b.raw.ExtractResume(raw_resume)
  response.token_usage.input
  response.token_usage.output
  response.token_usage.token
  return response

But we're expecting this to land roughly by mid next week!

As far as how you can do this today, if you port one of your prompts over to BAML, you should be able to see the token usage in the playground!

Aug 15 '24 15:08 hellovai

+1 for getting the token count. We are adding usage based pricing where we need to know the number of tokens used to pass it to the metered pricing for Stripe.

Aug 28 '24 09:08 seawatts

Ah got it, yes! we're working on this and hope to have a way of doing this pretty soon:

from baml_client.sync_client import b
from baml_client.types import Resume
def example(raw_resume: str) -> Resume: 
  # BAML's internal parser guarantees ExtractResume
  # to be always return a Resume type
  response = b.raw.ExtractResume(raw_resume)
  response.token_usage.input
  response.token_usage.output
  response.token_usage.token
  return response

But we're expecting this to land roughly by mid next week!

Hey @hellovai, as of v0.55.3 this isn't available on the canary branch, is there any alternate branch where this is implemented? The lack of this feature is the only thing stopping me from switching to BAML.

Sep 12 '24 07:09 Sidd065

Additional requests:

https://discord.com/channels/1119368998161752075/1253172394345107466/1283256668393701419
https://discord.com/channels/1119368998161752075/1274836321948663939/1275194199582572627

Sep 12 '24 17:09 sxlijin

My current workaround plan is to have the application internally proxy requests from BAML providers to OpenAI/Anthropic/OpenRouter both so I can get the full request/response bodies for my own o11y stack but also for getting token usages in the response - but having this built in would be super lovely!

Sep 22 '24 21:09 lukeramsden

we're finally working on this btw!!

We'd love some preliminary reviews on here if you all have opinions prior to us implementing this. Sorry its taken us so long, but we've got a design we hope you like and will offer lots of flexibility!

https://github.com/orgs/BoundaryML/discussions/1289

@lukeramsden @Sidd065 @polzounov

Jan 03 '25 22:01 hellovai

This has been resolved with https://docs.boundaryml.com/guide/baml-advanced/collector-track-tokens

Apr 22 '25 04:04 Sidd065