🐛 [Stream Firestore to BigQuery] Events stop streaming from firestore to bigquery, but fixed through extension update?
- Extension name: firestore-bigquery-export
- Extension version: 0.1.55
Steps to reproduce:
Several months ago the extension started randomly stopping streaming records into BigQuery. This seems to be nearly completely stopped until we upgrade the extension to a new version. We don't see any errors in the logs or anything. We have one version of the extension that streams into a non-partitioned table and one that streams into a partitioned table. This only seems to affect the partitioned table.
Expected result
Records continuously stream into BigQuery without interruption.
Actual result
Records are omitted from the BigQuery table until we upgrade the version.
Hey folks, I'm working with @leighajarett on this problem. What we see is that the extension works fine for is for a while, and stops writing most events (our Firestore write volume is pretty constant). When we install a new version of the extension, it works again - until it stops later.
Any idea what could be going on to cause this, or even how we can troubleshoot it?
@puf does this chart represent exports count in BigQuery?
It represents the number of events per day, its a count of the records in the table
Just to add some more information here - we pinpointed a specific event that is missing from the bigquery table.
In the logs, we can see this error
We're wondering if things are timing out somewhere? Maybe from an overload of events?
We (Leigha, myself and our team) have been analyzing a bit further, and these metrics from the Cloud Run task queue associated with one of our extension instances seems pretty conclusive:
In the top chart you can see that:
- We're adding tasks (green line) at a rate of 4-6 million per time slot of 3 hours, which is about 500 per second.
- Tasks are being processed (blue line) at a rate of 1.1 million per 3 hours, so about 100 per second.
- Tasks are initially completed (orange line), but then quickly start failing all (purple line).
In the bottom chart you see the size of the task queue, which grows to 500 million, which is presumably its maximum. So... the queue is just not able to process the tasks that the extension is adding to it.
We've just changed the configuration of this queue to have a Max rate of 500/s (the maximum we can set) to see if that allows it to drain the backlog of tasks, but given the rate at which we're adding tasks that likely won't be enough for long.
We've also upgraded one of our instances of this extension to the new 0.1.56 version, and no longer see the same errors in our logs for that instance.
Five days in, we're still seeing the events being streamed into BigQuery, so 🎉
Hi all! I'm glad this seems to be resolved as of 0.1.56, I'm going to close this as completed, thanks for your patience!
Feel free to reopen or open a new issue if it happens again.