extensions icon indicating copy to clipboard operation
extensions copied to clipboard

🐛 [firestore-bigquery-export] backfilling less than 300k docs took days and cost ~$200 USD

Open jjaklitsch opened this issue 1 year ago • 7 comments

[READ] Step 1: Are you in the right place?

Issues filed here should be about bugs for a specific extension in this repository. If you have a general question, need help debugging, or fall into some other category use one of these other channels:

  • For general technical questions, post a question on StackOverflow with the firebase tag.
  • For general Firebase discussion, use the firebase-talk google group.
  • To file a bug against the Firebase Extensions platform, or for an issue affecting multiple extensions, please reach out to Firebase support directly.

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.46
  • Configuration values (redact info where appropriate): Cloud Functions location redacted BigQuery Dataset location redacted BigQuery Project ID redacted Database ID (default) Collection path occasions Enable Wildcard Column field with Parent Firestore Document IDs (Optional) false Dataset ID firestore_raw_export Table ID occasions_v2 BigQuery SQL table Time Partitioning option type (Optional) none BigQuery Time Partitioning column name (Optional) createdAt Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional) createdAt BigQuery SQL Time Partitioning table schema field(column) type (Optional) TIMESTAMP BigQuery SQL table clustering (Optional) Parameter not set Maximum number of synced documents per second (Optional) 100 Backup Collection Name (Optional) Parameter not set Transform function URL (Optional) Parameter not set Use new query syntax for snapshots no Exclude old data payloads (Optional) no Import existing Firestore documents into BigQuery? yes Existing Documents Collection (Optional) occasions Use Collection Group query (Optional) no Docs per backfill 200 Cloud KMS key name (Optional) Parameter not set

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

We installed the suggestion and set the preference to import existing records. The firestore database we imported from had <250K records. While importing, we saw a massive spike in firestore reads up to 45 million per hour. Our typically read volume is <10K per hour. We incurred a cost of ~$200 just from running this import.

Expected result

Bigquery database is created with minimal impact on read volumes

Actual result

45 million firestore reads per hour. 120 million reads total in a few hours.

jjaklitsch avatar Mar 26 '24 02:03 jjaklitsch

Linking to the same bug someone else reported: https://github.com/firebase/extensions/issues/2000

jjaklitsch avatar Mar 26 '24 02:03 jjaklitsch

Hey looking into this now, do you have any relevant cloud function logs/errors?

cabljac avatar Mar 26 '24 09:03 cabljac

I believe this issue is caused by us using offset to paginate, I am working on an alternative approach.

cabljac avatar Mar 26 '24 10:03 cabljac

Yes, see attached for the logs. firestore-export-logs.docx

When do you think you'll have a fix in? Also, what's the process for requesting a credit?

jjaklitsch avatar Mar 28 '24 04:03 jjaklitsch

Hi - is there any update on this? Any other recommendations for streaming firestore data to bigquery?

On Tue, Mar 26, 2024 at 3:52 AM Jacob Cable @.***> wrote:

I believe this issue is caused by us using offset to paginate, I am working on an alternative approach.

— Reply to this email directly, view it on GitHub https://github.com/firebase/extensions/issues/2003#issuecomment-2020109360, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIRPP65GMCX3WTHGSAQ6VTY2FAOTAVCNFSM6AAAAABFICKA3SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRQGEYDSMZWGA . You are receiving this because you authored the thread.Message ID: @.***>

jjaklitsch avatar Apr 02 '24 05:04 jjaklitsch

You can still use the extension for streaming, this issue only affects backfilling which we disabled for now. Another solution that can backfill your existing data is to use the import script, which you can run locally.

You can reach out to Firebase support on this link.

pr-Mais avatar Apr 02 '24 11:04 pr-Mais

Hi, software engineer from Firebase here.

Just wanted to chime in on this issue, we have turned off backfill so if you use the latest version you won't run into the issue, and as Mais explained above, the import script is the temporary work around.

That being said we are actively working on reworking the backfill implementation such that offset isn't used. Will follow up when that is pushed out.

huangjeff5 avatar May 29 '24 01:05 huangjeff5

Hi, i'm going to close this issue so we have a single issue tracking backfill, https://github.com/firebase/extensions/issues/2029

cabljac avatar May 27 '25 08:05 cabljac