extensions icon indicating copy to clipboard operation
extensions copied to clipboard

🐛 [Stream Firestore to BigQuery] Events stop streaming from firestore to bigquery, but fixed through extension update?

Open leighajarett opened this issue 1 year ago • 3 comments

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.55

Steps to reproduce:

Several months ago the extension started randomly stopping streaming records into BigQuery. This seems to be nearly completely stopped until we upgrade the extension to a new version. We don't see any errors in the logs or anything. We have one version of the extension that streams into a non-partitioned table and one that streams into a partitioned table. This only seems to affect the partitioned table.

Expected result

Records continuously stream into BigQuery without interruption.

Actual result

Records are omitted from the BigQuery table until we upgrade the version.

leighajarett avatar Oct 21 '24 19:10 leighajarett

Hey folks, I'm working with @leighajarett on this problem. What we see is that the extension works fine for is for a while, and stops writing most events (our Firestore write volume is pretty constant). When we install a new version of the extension, it works again - until it stops later.

image

Any idea what could be going on to cause this, or even how we can troubleshoot it?

puf avatar Oct 21 '24 20:10 puf

@puf does this chart represent exports count in BigQuery?

pr-Mais avatar Oct 22 '24 16:10 pr-Mais

It represents the number of events per day, its a count of the records in the table

leighajarett avatar Oct 22 '24 19:10 leighajarett

Just to add some more information here - we pinpointed a specific event that is missing from the bigquery table.

In the logs, we can see this error Screenshot 2024-11-14 at 1 48 18 PM

We're wondering if things are timing out somewhere? Maybe from an overload of events?

leighajarett avatar Nov 14 '24 18:11 leighajarett

We (Leigha, myself and our team) have been analyzing a bit further, and these metrics from the Cloud Run task queue associated with one of our extension instances seems pretty conclusive:

CleanShot 2024-11-14 at 11 45 25@2x

In the top chart you can see that:

  • We're adding tasks (green line) at a rate of 4-6 million per time slot of 3 hours, which is about 500 per second.
  • Tasks are being processed (blue line) at a rate of 1.1 million per 3 hours, so about 100 per second.
  • Tasks are initially completed (orange line), but then quickly start failing all (purple line).

In the bottom chart you see the size of the task queue, which grows to 500 million, which is presumably its maximum. So... the queue is just not able to process the tasks that the extension is adding to it.

We've just changed the configuration of this queue to have a Max rate of 500/s (the maximum we can set) to see if that allows it to drain the backlog of tasks, but given the rate at which we're adding tasks that likely won't be enough for long.

We've also upgraded one of our instances of this extension to the new 0.1.56 version, and no longer see the same errors in our logs for that instance.

puf avatar Nov 14 '24 21:11 puf

Five days in, we're still seeing the events being streamed into BigQuery, so 🎉

puf avatar Nov 19 '24 18:11 puf

Hi all! I'm glad this seems to be resolved as of 0.1.56, I'm going to close this as completed, thanks for your patience!

Feel free to reopen or open a new issue if it happens again.

cabljac avatar Feb 11 '25 11:02 cabljac