extensions icon indicating copy to clipboard operation
extensions copied to clipboard

[firestore-bigquery-export] Disable changelog table

Open harveyappleton opened this issue 2 years ago • 6 comments

[READ] Step 1: Are you in the right place?

Issues filed here should be about a feature request for a specific extension in this repository. To file a feature request that affects multiple extensions or the Firebase Extensions platform, please reach out to Firebase support directly.

[REQUIRED] Step 2: Extension name

This feature request is for extension: firestore-bigquery-export

What feature would you like to see?

I'd like there to be an option to remove the changelog table in BigQuery as I'm only interested in current state of Firestore being visible in BigQuery, and my changelog would fill up fast and use a lot of space in BigQuery.

How would you use it?

Would use it to disable changelog dataset, as I'm only interested in the current content.

harveyappleton avatar Jun 25 '23 11:06 harveyappleton

Yes please.

madmacc avatar Jul 17 '23 03:07 madmacc

+1

korn101 avatar Aug 11 '23 11:08 korn101

We will have to investigate what's possible here, as I believe the changelog is how the latest views etc are generated. Appreciate that this is popular request, we're looking into it :)

cabljac avatar Jan 31 '24 08:01 cabljac

I would be very great, here we just want the latest version of a document and I've had to setup a job that deletes old snapshots periodicly.

tiagosilveiransj avatar Aug 14 '24 07:08 tiagosilveiransj

To solve this problem, I use a scheduled query that cleans the data once a month. If you have a lot of data, you can reduce the interval and range (reducing the interval without reducing the range will lead to an increase in the cost of completing the task, since it passes through a large amount of data).

The query performed by the task: (*replace projectID and prefix_table_1, prefix_table_2, etc. with your own project ID and tables):

CALL `projectID`.`firestore_export`.`clean_changelog_all`(['prefix_table_1_raw_changelog','prefix_table_2_raw_changelog']);

The procedure that starts the cleanup of each table: (*replace projectID with your own project ID):

CREATE OR REPLACE PROCEDURE `projectID.firestore_export.clean_changelog_all`(tableNames ARRAY<STRING>)
BEGIN
  FOR t IN (SELECT * FROM UNNEST(tableNames) AS name)
  DO
    CALL `projectID.firestore_export.clean_changelog`(t.name);
  END FOR;
END;

The procedure for clearing the table: (*replace projectID with your own project ID and (optional) INTERVAL -1 MONTH):

CREATE OR REPLACE PROCEDURE `projectID.firestore_export.clean_changelog`(tableName STRING)
BEGIN
  DECLARE query STRING;
  SET query = CONCAT("DELETE FROM `projectID.firestore_export.", tableName, "` ","\n",
  "WHERE (document_name, timestamp) IN","\n",
  "(","\n",
  "  WITH latest AS (","\n",
  "    SELECT MAX(timestamp) as timestamp, document_name","\n",
  "    FROM `projectID.firestore_export.",tableName,"`","\n",
  "    GROUP BY document_name","\n",
  "  )","\n",
  "  SELECT (t.document_name, t.timestamp)","\n",
  "  FROM `projectID.firestore_export.",tableName,"` AS t","\n",
  "  JOIN latest  ON (t.document_name = latest.document_name )","\n",
  "  WHERE t.timestamp != latest.timestamp","\n",
  "  AND DATETIME(t.timestamp) < DATE_ADD(CURRENT_DATETIME(), INTERVAL -1 MONTH)","\n",
  ");");
  EXECUTE IMMEDIATE query;
END;

I will also note that deleting the previous version every time you write a new one takes quite a long time, so I do not recommend going this way. And also that update operations are quite slow in BigQuery and are not supported in tables using streaming insertion, so this option is also not suitable.

siarheidudko avatar Aug 22 '24 15:08 siarheidudko

This is the manual solution I used.

DELETE FROM your_dataset.your_table
  WHERE STRUCT(document_name, timestamp) NOT IN (
      SELECT AS STRUCT document_id, MAX(timestamp) AS timestamp
      FROM your_dataset.your_table
      GROUP BY document_name
  )

https://stackoverflow.com/a/78232504/10222449

madmacc avatar Sep 08 '24 21:09 madmacc