extensions icon indicating copy to clipboard operation
extensions copied to clipboard

🐛 [firestore-bigquery-export] Import Script Throws 'Request Entity Too Large'

Open larstbone opened this issue 1 year ago • 1 comments

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.51
  • Configuration values (redact info where appropriate):

BigQuery Dataset location->us BigQuery Project ID->xxxxxxxxxxxxxxxxx Collection path->xxxxxxxxxxxxxxxxxxx Enable Wildcard Column field with Parent Firestore Document IDs (Optional)->false Dataset ID->firestore_events Table ID->events BigQuery SQL table Time Partitioning option type (Optional)->NONE BigQuery Time Partitioning column name (Optional)->Parameter not set Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional)->Parameter not set BigQuery SQL Time Partitioning table schema field(column) type (Optional)->omit BigQuery SQL table clustering (Optional)->Parameter not set Maximum number of synced documents per second (Optional)->100 Backup Collection Name (Optional)->Parameter not set Transform function URL (Optional)->Parameter not set Use new query syntax for snapshots->yes Exclude old data payloads (Optional)->yes Use Collection Group query (Optional)->no Cloud KMS key name (Optional)->Parameter not set

[REQUIRED] Step 3: Describe the problem

We cannot import existing data that may be up to 900KB.

Steps to reproduce:

  1. Using the extension configured on some collection
  2. Reconfigure the extension by setting EXCLUDE_OLD_DATA set to true
  3. When reconfiguring, confirm DO_BACKFILL is no longer available
  4. Create a document that is 900KB
  5. Confirm the document does not sync and fails with error task size too large
  6. Run the script fs-bq-import-collection to try to import existing data
  7. Confirm error Request Entity Too Large

During installation of the extension cannot backfill existing data because DO_BACKFILL is currently disabled by #2005.

Expected result
  1. DO_BACKFILL is available during both installation and reconfiguration of the extension
  2. script fs-bq-import-collection does not throw Request Entity Too Large error if we are ignoring old data by setting EXCLUDE_OLD_DATA flag to true
Actual result
  1. DO_BACKFILL is not available when the extension is installed nore when it is reconfigured
  2. script fs-bq-import-collection fails with Request ENtity too Large

larstbone avatar Jul 16 '24 19:07 larstbone

any updates on this?

nikcaryo-super avatar Oct 10 '24 17:10 nikcaryo-super

Hi, so this seems like two issues

  • DO_BACKFILL has been removed for now, as there were some scaling issues with the design of this feature.

  • Request Entity too large - I will investigate this. Are the firestore docs in question quite large?

cabljac avatar Jan 03 '25 09:01 cabljac

@cabljac in our case, we definitely have some documents approaching the firestore document size limit. If there's a bigquery entity size limit that's < the firestore size limit, I'd expect those documents to fail to import, not the whole job to fail.

nikcaryo-super avatar Jan 03 '25 15:01 nikcaryo-super

thanks! good to know.

cabljac avatar Jan 06 '25 16:01 cabljac

yeah i can see it hitting the 10MB request size limit specified here hmm

cabljac avatar Jan 30 '25 09:01 cabljac

I think what we could do:

  1. if a batch fails, we log that it fails and skip. we provide an optional --output-failed-batches flag or similar perhaps, to save references to the failed imports

  2. we could consider a dynamic batch size option, where if a batch is too big, we split it. I'm wary that with 900KB documents, batch size might have to be quite small, resulting in a very long import process.

In the repro it suggests a single 900KB document would cause the error, i have to test this out, since it shouldn't if my theory about request size is the issue

cabljac avatar Jan 30 '25 09:01 cabljac

From my testing it seems a single large document is OK, which suggests it is the request size limit

cabljac avatar Jan 30 '25 10:01 cabljac

Image

confirmed this is the issue

cabljac avatar Jan 30 '25 10:01 cabljac

In this PR https://github.com/firebase/extensions/pull/2264 I've made it so that this is the case:

I'd expect those documents to fail to import, not the whole job to fail.

In the future we could experiment with dynamic batching based on document size.

cabljac avatar Jan 30 '25 18:01 cabljac