🐛 [firestore-bigquery-export] Import Script Throws 'Request Entity Too Large'
[REQUIRED] Step 2: Describe your configuration
- Extension name: firestore-bigquery-export
- Extension version: 0.1.51
- Configuration values (redact info where appropriate):
BigQuery Dataset location->us BigQuery Project ID->xxxxxxxxxxxxxxxxx Collection path->xxxxxxxxxxxxxxxxxxx Enable Wildcard Column field with Parent Firestore Document IDs (Optional)->false Dataset ID->firestore_events Table ID->events BigQuery SQL table Time Partitioning option type (Optional)->NONE BigQuery Time Partitioning column name (Optional)->Parameter not set Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional)->Parameter not set BigQuery SQL Time Partitioning table schema field(column) type (Optional)->omit BigQuery SQL table clustering (Optional)->Parameter not set Maximum number of synced documents per second (Optional)->100 Backup Collection Name (Optional)->Parameter not set Transform function URL (Optional)->Parameter not set Use new query syntax for snapshots->yes Exclude old data payloads (Optional)->yes Use Collection Group query (Optional)->no Cloud KMS key name (Optional)->Parameter not set
[REQUIRED] Step 3: Describe the problem
We cannot import existing data that may be up to 900KB.
Steps to reproduce:
- Using the extension configured on some collection
- Reconfigure the extension by setting EXCLUDE_OLD_DATA set to
true - When reconfiguring, confirm DO_BACKFILL is no longer available
- Create a document that is 900KB
- Confirm the document does not sync and fails with error
task size too large - Run the script
fs-bq-import-collectionto try to import existing data - Confirm error
Request Entity Too Large
During installation of the extension cannot backfill existing data because DO_BACKFILL is currently disabled by #2005.
Expected result
- DO_BACKFILL is available during both installation and reconfiguration of the extension
- script
fs-bq-import-collectiondoes not throwRequest Entity Too Largeerror if we are ignoring old data by setting EXCLUDE_OLD_DATA flag to true
Actual result
- DO_BACKFILL is not available when the extension is installed nore when it is reconfigured
- script
fs-bq-import-collectionfails withRequest ENtity too Large
any updates on this?
Hi, so this seems like two issues
-
DO_BACKFILL has been removed for now, as there were some scaling issues with the design of this feature.
-
Request Entity too large - I will investigate this. Are the firestore docs in question quite large?
@cabljac in our case, we definitely have some documents approaching the firestore document size limit. If there's a bigquery entity size limit that's < the firestore size limit, I'd expect those documents to fail to import, not the whole job to fail.
thanks! good to know.
yeah i can see it hitting the 10MB request size limit specified here hmm
I think what we could do:
-
if a batch fails, we log that it fails and skip. we provide an optional
--output-failed-batchesflag or similar perhaps, to save references to the failed imports -
we could consider a dynamic batch size option, where if a batch is too big, we split it. I'm wary that with 900KB documents, batch size might have to be quite small, resulting in a very long import process.
In the repro it suggests a single 900KB document would cause the error, i have to test this out, since it shouldn't if my theory about request size is the issue
From my testing it seems a single large document is OK, which suggests it is the request size limit
confirmed this is the issue
In this PR https://github.com/firebase/extensions/pull/2264 I've made it so that this is the case:
I'd expect those documents to fail to import, not the whole job to fail.
In the future we could experiment with dynamic batching based on document size.