🐛 [firestore-bigquery-export] Task size too large errors occuring even with EXCLUDE_OLD_DATA set to yes/true
[READ] Step 1: Are you in the right place?
Issues filed here should be about bugs for a specific extension in this repository. If you have a general question, need help debugging, or fall into some other category use one of these other channels:
- For general technical questions, post a question on StackOverflow with the firebase tag.
- For general Firebase discussion, use the firebase-talk google group.
- To file a bug against the Firebase Extensions platform, or for an issue affecting multiple extensions, please reach out to Firebase support directly.
[REQUIRED] Step 2: Describe your configuration
- Extension name: firestore-bigquery-export
- Extension version: 0.1.50
- Configuration values (redact info where appropriate):
- BigQuery Dataset location: us
- BigQuery Project ID: xxx
- Database ID: (default)
- Collection path: xxx
- Enable Wildcard Column field with Parent Firestore Document IDs (Optional): false
- Dataset ID: xxx
- Table ID: xxx
- BigQuery SQL table Time Partitioning option type (Optional): DAY
- BigQuery Time Partitioning column name (Optional): timestamp
- Firestore Document field name for BigQuery SQL Time Partitioning field option (Optional): Parameter not set
- BigQuery SQL Time Partitioning table schema field(column) type (Optional): omit
- BigQuery SQL table clustering (Optional): document_id
- Maximum number of synced documents per second (Optional): 100
- Backup Collection Name (Optional): Parameter not set
- Transform function URL (Optional): Parameter not set
- Use new query syntax for snapshots: no
- Exclude old data payloads (Optional): yes
- Use Collection Group query (Optional): no
- Cloud KMS key name (Optional): Parameter not set
[REQUIRED] Step 3: Describe the problem
Even when using the EXCLUDE_OLD_DATA setting to prevent old_data from being populated, we are still seeing Task size too large errors on many messages. This might mean that firestore payloads close to 1MB are being padded in a way that the subsequent Task exceeds 1MB
Steps to reproduce:
- Install extension version 0.1.50, ensure that
Exclude old data payloadsis set toyes - Write a large document to firestore
Expected result
No Task size too large errors should appear at all.
Actual result
Observing many Task size too large errors in logs.
It occurs to me that we probably shouldn't send the payload to a cloud task if the maximum payload size is 1MB, and a firestore document can technically be up to 1MB in size (even if that is rare)
Perhaps we should be just sending through the document references, and then fetching them in the task handler function. This would add reads to the extension, but would eliminate this issue entirely
Using version 0.1.51 and having same problem. We know our documents are between 500KB and 900KB, so was hoping the EXCLUDE_OLD_DATA would be helpful. @747project , I'm curious if you know the size of your large document. We're hoping that if the doc is slightly less than 1MB that the EXCLUDE_OLD_DATA flag would work.
We're having trouble importing, because it seems the DO_BACKFILL flag is disabled, and the script fs-bq-import-collection is throwing Request Entity Too Large error. We're currently thinking the script does not honor the EXCLUDE_OLD_DATA flag.
@larstbone sorry for the delayed response. I unfortunately don't have a specific size of the documents that are causing the problem but I do know that we have multiple collections that contain many documents that are very close to, or just about 1MB in size. Either way, if there is a 1MB cap on cloud task and there is a 1MB cap on firestore, then the current task-based solution cannot inherently support all firestore doc scenarios
Hi all!
So in versions past 0.1.56 hopefully this problem is almost completely mitigated, as we only use cloud tasks as a last resort (after several retries are made using the streaming API).
That being said
Perhaps we should be just sending through the document references, and then fetching them in the task handler function. This would add reads to the extension, but would eliminate this issue entirely
My suggested approach here won't work, as there is no way to guarantee consistency (or even if the doc exists) between sending the reference and reading the doc from firestore.