extensions icon indicating copy to clipboard operation
extensions copied to clipboard

bug(firestore-bigquery-export): Partitioning column remains null for valid Firestore Timestamp values

Open githinjikamau opened this issue 2 years ago • 7 comments

[REQUIRED] Step 2: Describe your configuration

Describe your configuration

Extension name: firestore-bigquery-export Extension version: firebase/[email protected] Configuration values (redact info where appropriate): image

[REQUIRED] Step 3: Describe the problem

Steps to reproduce:

Steps to reproduce:

  • Configure the extension as shown in above image
  • Create a document in the collection with Timestamp field, e.g.: image
Expected result

Matching BigQuery record has a created_at column populated with the corresponding Firestore value.

Actual result

Matching BigQuery record has null in created column and logs read ext-orders-partitoned-fsexportbigqueryjm85infkwxc7 Wrong type of Firestore Field for TimePartitioning. Accepts only strings in BigQuery format (DATE, DATETIME, TIMESTAMP) and Firestore Timestamp. Firestore Document field path: projects/<redacted>/databases/(default)/documents/orders/<redacted>. Field name: dateCreated. Field data: [object Object]. Schema field "created_at" value will be null. Wrong type of Firestore Field for TimePartitioning. Accepts only strings in BigQuery format (DATE, DATETIME, TIMESTAMP) and Firestore Timestamp. Firestore Document field path: projects/<redacted>/databases/(default)/documents/<redacted>/<redacted>. Field name: dateCreated. Field data: [object Object]. Schema field "created_at" value will be null.

Looks like it has recurred after being fixed in PR #906

githinjikamau avatar Jul 14 '22 10:07 githinjikamau

Thanks @githinjikamau

This error seems to suggest the following validation check is failing...

  private isValidPartitionTypeDate(value) {
    /* Check if valid timestamp value from sdk */
    if (value instanceof firebase.firestore.Timestamp) return true;

    /* Check if valid date/time value from console */
    return Object.prototype.toString.call(value) === "[object Date]";
  }

I'll add to our project board for investigation.

dackers86 avatar Jul 14 '22 12:07 dackers86

Would it be possible to explain how data is currently added to the database?

For example is this added manually through the Firebase console or Firebase sdk?

dackers86 avatar Jul 14 '22 13:07 dackers86

Hey @dackers86,

Thank you for your quick response. The data is added via the Firebase sdk

githinjikamau avatar Jul 15 '22 06:07 githinjikamau

Further to this and referring back to my closed issue I raised in May last year I can also confirm that the creation of a partitioned table still doesn't appear to work. I have also attempted to configure the extension in a similar manner to @githinjikamau.

My conclusion is that other than the required configuration options i.e., dataset id, source Firestore collection and destination table prefix none of the subsequent options do anything at all.

I've tried all of them in various combinations and no matter what I set my options to, I always end up with a non partitioned table which seemingly ignores all the settings I've provided?

Here's an extract from the extension's log when attempting to back fill the BigQuery table from Firestore using the script:-

$ npx @firebaseextensions/fs-bq-import-collection

Importing data from Cloud Firestore Collection: sales, to BigQuery Dataset: firestore_stream_v2, Table: sales_by_month_raw_changelog {"severity":"INFO","message":"Creating BigQuery dataset: firestore_stream_v2"} {"severity":"INFO","message":"Created BigQuery dataset: firestore_stream_v2"} {"severity":"INFO","message":"Creating BigQuery table: sales_by_month_raw_changelog"} {"severity":"WARNING","message":"No valid table reference is available. Skipping partitioning"} {"severity":"WARNING","message":"Cannot partition an existing table firestore_stream_v2_sales_by_month_raw_changelog"} {"severity":"WARNING","message":"Cannot partition an existing table firestore_stream_v2_sales_by_month_raw_changelog"} {"severity":"INFO","message":"Clustering removed on sales_by_month_raw_changelog"} {"severity":"INFO","message":"Created BigQuery table: sales_by_month_raw_changelog"} {"severity":"WARNING","message":"No valid table reference is available. Skipping partitioning"} {"severity":"WARNING","message":"Cannot partition an existing table firestore_stream_v2_sales_by_month_raw_latest"}

Our live sales data is continuing to stream without issue, but the table is so large now that it's really starting to take significant time to query it for anything useful. I'm not a Big Query expert by any means, but if we could just get the extension to create a partitioned table by month using the 'created' date in our Firestore 'sales' document (which is at root level and a valid Firestore.Timestamp instance) this would be extremely useful!

If you require any more information that I haven't already provided in by (now closed) issue or this one I'll gladly try and provide it.

One further thing that might be useful from the Firestore Timestamp side of things (not the partitioning problem) is that we use a sentinel value for the 'created' date/time i.e., the server sets this at the time of writing the document so maybe there's something subtly different with this as opposed to @githinjikamau use of the SDK?

Regards

Benj

soarb avatar Jul 22 '22 09:07 soarb

Thanks @soarb

I'll retest using a server timestamp and post an example, I'll need to lookup what I had tried originally.

Re: the partitioning problem could you create a new issue for discussion? This may also be an existing issue in the repository which I could link any updates too

dackers86 avatar Jul 22 '22 10:07 dackers86

Brilliant, thanks @dackers86 :) ... I'll create a new issue ... should arrive within the next hour or so, although it will be quite similar to issue 621 I suspect which is closed.

Forgive me if there are any similar open issues I've missed, but the partitioning hasn't worked since I first started using this extension back in May last year :(

soarb avatar Jul 22 '22 11:07 soarb

New issue created #1059 - please let me know if I can provide any further information.

soarb avatar Jul 22 '22 13:07 soarb

@dackers86 I can also confirm that all values for my partitioned table's column are null for firestore documents which have valid Firestore Timestamp values.

soarb avatar Jan 13 '23 14:01 soarb