flutterfire icon indicating copy to clipboard operation
flutterfire copied to clipboard

🚀 [firebase_ai] Still lacking thinking features and Context cache explicit and implicit (reporting in metadata)

Open yvrez opened this issue 5 months ago • 11 comments

Support thinking budget and caching through the firebase AI SDK

These two are critical for apps that must manage user experience &. token budget per user. Thinking takes time and not always required (or in limited token-budget)

Context-caching (implicit and explicit) can help projects that relies on big context with multiple calls/chains to reduce the cost without sacrifying output-token quality/precision. No metadata returned that confirms implicit caching is used (like in genai sdk) and no API to use explicit caching with TTL.

Thanks !

yvrez avatar Jun 26 '25 13:06 yvrez

we really need this If it will be added, when will it be added? We need an answer to this. It is important for our application that it is added as soon as possible

wxctemre avatar Jun 30 '25 19:06 wxctemre

Plus one - without control over the thinking budget, it is very difficult to use the new Gemini models effectively. This is a core parameter of these new models.

jacobsimionato avatar Jul 04 '25 05:07 jacobsimionato

any update ? Thanks !

yvrez avatar Jul 07 '25 17:07 yvrez

Thanks for the feedback:

  • Thinking budget is planned for next release: Second half of July.
  • Implicit Context Caching is enabled by default on Gemini 2.5 models, nothing to do in the SDKs.
  • Explicit Content Caching is on the backlog, no planned for Q3 2025 - It requires more work than expected. We also believe than many top scenarios may be resolved with implicit caching and URL Content (when we make it available via Firebase AI Logic)

marb2000 avatar Jul 18 '25 18:07 marb2000

Thanks for the update @marb2000 i'm gonna integrate the thinking budget/features in our app. About the implicit , do you provide total cached token count in the response metadata block ? It will be super useful because we can evaluate the effectiveness of our prompt structure to see if implicit context caching is working for us ?

yvrez avatar Jul 22 '25 08:07 yvrez

I am unable to provide firebase AI to users because each query has filePart reference to a gs:// file which is PDF and adds 300K tokens to each query. This file is different between each cloud storage user. If there is a mechanism to cache this file and reduce the tokens, I will use it. Otherwise I am stuck.

apps4av avatar Dec 09 '25 01:12 apps4av

@apps4av we are right now thinking how we can add support for explicit context caching which differs depending on the scenario. A couple of questions:

Does the PDF file change often, or is it static for that user? If the file is static, caching is effective. However, if it changes frequently, caching might not save much money or time.

How many queries does a single user make against this one PDF? Context caching has a TTL and a storage cost (in addition to the GS file storage). If a user only asks one question and then leaves, you still pay the overhead to load the cache. Caching is generally only worthwhile if the user asks multiple questions about the same document. Is this a multi-turn conversation, or is it a single-shot query?

marb2000 avatar Dec 09 '25 05:12 marb2000

Thanks for your help. The user will upload their car manual and make queries against it. Users typically change their car manual when they get a new car every 3 to 10 years. So TTL will be 3 years. They will make daily queries against their manual and will typically exceed a million tokens daily because the manuals are several Mb. If the manual can be cached and not counted towards the token count, I have a business case.

apps4av avatar Dec 09 '25 11:12 apps4av

Thanks for the scenario. It's very helpful for my team.

Also, I know it’s none of my business, but putting on my Product Manager hat for a second, I’ve been thinking about the "Car Manual AI" concept you mentioned. It’s a fantastic use case!

However, something caught my eye, so I ran a "back of the napkin" calculation on the Explicit Context Caching costs. I wanted to flag a major financial risk before you get too far into the architecture.

If you stick to the plan of uploading a manual and keeping a unique cache alive for each individual user for 3 years, the storage fees will likely kill the business model:

  • Storage Cost: ~$1.00 per 1M tokens per hour.
  • Daily Cost per User: $24.00 (just to keep the manual in memory).
  • Annual Cost per User: ~$8,760.

Even if the manual is smaller (e.g., 200k tokens), you are still looking at ~$5/day per user just for storage. Since most users only query their manual once in a while, it is actually ~80x cheaper to not use explicit caching and just pay the standard processing fee for each query.

I know the business model isn't my lane, so please take this with a grain of salt! But if I were in your shoes, I would strongly consider pivoting the architecture to Shared/Deduplicated Caching.

Since thousands of people drive the exact same 2022 Ford F-150, you only need to cache that manual once.

  • Scenario: 1,000 users share one cached manual.
  • Cost: You pay the $24/day storage fee once.
  • Savings: You get the 90% discount on every query those 1,000 users make.

This shifts the math from "financial issues" to "profitable business."

Please double-check my numbers—this is just a quick exercise I did for fun!

marb2000 avatar Dec 10 '25 05:12 marb2000

Thanks Miguel for the insight. Can you please clarify the storage costs? In the Blaze plan I see that the storage cost for firebase storage is $0.02 per GB per month. Is the cost you are referring to for cache? Where did you get the numbers? I can find the manual online and use grounding to use the model and search in combination. But in that case there is a chance that a particular manual is not available because not all service manuals are easy to find. If the costs number you stated are for caching then caching is useless for me as you said.

apps4av avatar Dec 10 '25 10:12 apps4av

I found the pricing you mentioned under Gemini pricing - context caching cost. Yes, this caching idea won't work because it's too expensive. I will need to use grounding with a bit of risk of finding the wrong manual, or not finding at all. Thanks for your help.

On Wed, Dec 10, 2025, 12:42 AM Miguel Ramos @.***> wrote:

marb2000 left a comment (firebase/flutterfire#17455) https://github.com/firebase/flutterfire/issues/17455#issuecomment-3635483104

Thanks for the scenario. It's very helpful for my team.

Also, I know it’s none of my business, but putting on my Product Manager hat for a second, I’ve been thinking about the "Car Manual AI" concept you mentioned. It’s a fantastic use case!

However, something caught my eye, so I ran a "back of the napkin" calculation on the Explicit Context Caching costs. I wanted to flag a major financial risk before you get too far into the architecture.

If you stick to the plan of uploading a manual and keeping a unique cache alive for each individual user for 3 years, the storage fees will likely kill the business model:

  • Storage Cost: ~$1.00 per 1M tokens per hour. -Daily Cost per User: $24.00 (just to keep the manual in memory).
  • Annual Cost per User: ~$8,760.

Even if the manual is smaller (e.g., 200k tokens), you are still looking at ~$5/day per user just for storage. Since most users only query their manual once in a while, it is actually ~80x cheaper to not use explicit caching and just pay the standard processing fee for each query.

I know the business model isn't my lane, so please take this with a grain of salt! But if I were in your shoes, I would strongly consider pivoting the architecture to Shared/Deduplicated Caching.

Since thousands of people drive the exact same 2022 Ford F-150, you only need to cache that manual once.

  • Scenario: 1,000 users share one cached manual.
  • Cost: You pay the $24/day storage fee once.
  • Savings: You get the 90% discount on every query those 1,000 users make.

This shifts the math from "financial issues" to "profitable business."

Please double-check my numbers—this is just a quick exercise I did for fun!

— Reply to this email directly, view it on GitHub https://github.com/firebase/flutterfire/issues/17455#issuecomment-3635483104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWPRTYBD6POVTJNW6T4MZD4A6XELAVCNFSM6AAAAACAGJ3G6GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZVGQ4DGMJQGQ . You are receiving this because you were mentioned.Message ID: @.***>

apps4av avatar Dec 10 '25 11:12 apps4av