extensions icon indicating copy to clipboard operation
extensions copied to clipboard

firestore-bigquery-export backfilling less than 800k docs took days and cost over 1k U$

Open MorenoMdz opened this issue 1 year ago • 5 comments

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.45
  • Configuration values (redact info where appropriate):
Cloud Functions location
us-central1
BigQuery Dataset location
us
BigQuery Project ID
redacted
Collection path
jobs
Enable Wildcard Column field with Parent Firestore Document IDs (Optional)
false
Dataset ID
firestore_export
Table ID
jobs
BigQuery SQL table Time Partitioning option type (Optional)
NONE
BigQuery Time Partitioning column name (Optional)
Parameter not set
Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional)
Parameter not set
BigQuery SQL Time Partitioning table schema field(column) type (Optional)
omit
BigQuery SQL table clustering (Optional)
Parameter not set
Maximum number of synced documents per second (Optional)
100
Backup Collection Name (Optional)
Parameter not set
Transform function URL (Optional)
Parameter not set
Use new query syntax for snapshots
no
Exclude old data payloads (Optional)
no
Import existing Firestore documents into BigQuery?
yes
Existing Documents Collection (Optional)
jobs
Use Collection Group query (Optional)
no
Docs per backfill
200
Cloud KMS key name (Optional)
Parameter not set

[REQUIRED] Step 3: Describe the problem

We have been heavy users of this extension for over 2 years it has provided great value and near flawless replication from our FS data to BQ, until March 19th we did setup the exporter in our smaller collection, jobs, that had at the time under 800k documents. The thing is that the exporter ended up running for almost 2 days, in a pattern that seemed recursive, and sometimes reading over 9k documents per second, notice how the Docs per backfill setting was set to 200, and this ended up causing us a 1k U$ spike in billing for that one day.

It is easy to see the read spikes bubbling up and slowly fading until they stopped over one day later from the FS key visualizer.

I do have a ticket open about this issue, but I do think this should be reported here.

Note, this was the first time we used the "backfill" option from the FB console, before that we always backfilled by hand with the good old script.

I cannot provide more information as it would be sensitive, but the internal GCP ticket is 50277733

Expected result

Under 50 cents of billing for this export. And the export should have taken a couple minutes.

Actual result

Over one thousand dollars was charged. And the export took almost two days.

MorenoMdz avatar Mar 22 '24 14:03 MorenoMdz

@MorenoMdz I tried reproducing but without luck, can you check the logs in the function fsexportbigquery, do you see any errors?

pr-Mais avatar Mar 26 '24 03:03 pr-Mais

@MorenoMdz I tried reproducing but without luck, can you check the logs in the function fsexportbigquery, do you see any errors?

We do have a bunch of "Cannot partition an existing table firestore_export_jobs_raw_changelog" warnings, but that's very common on the FS BQ exporter, no errors tho.

As I mentioned we are heavy users of the exporter since 2021 but this was the first collection the extension itself did the backfill, we always used the backfill script previously, so I would point towards an issue in the backfill itself. If you check the Firestore key visualizer you will notice the reads per second bubbled up over the next hours in something that looked recursively/exponentially called.

MorenoMdz avatar Mar 26 '24 11:03 MorenoMdz

I think we have a clue why this might be happening. The backfilling function uses offset to enqueue the batches to import sequentially, this explains the burst in reads you had. Thanks for the details you provided, we will disable this feature until we come up with a solution.

pr-Mais avatar Mar 26 '24 11:03 pr-Mais

I think we have a clue why this might be happening. The backfilling function uses offset to enqueue the batches to import sequentially, this explains the burst in reads you had. Thanks for the details you provided, we will disable this feature until we come up with a solution.

I see! Thanks for the quick response.

MorenoMdz avatar Mar 26 '24 11:03 MorenoMdz

I can also confirm I had the same issue on a fresh project, total docs 33k, docs per backfill 200, total reads after backfill done is 2.7M.

pr-Mais avatar Mar 26 '24 11:03 pr-Mais