extensions icon indicating copy to clipboard operation
extensions copied to clipboard

`path_params` is not included when importing firestore docs to BigQuery

Open jorroll opened this issue 3 years ago • 3 comments

[REQUIRED] Step 2: Describe your configuration

  • Extension name: firestore-bigquery-export
  • Extension version: 0.1.25
  • Configuration values (redact info where appropriate):
    • BIGQUERY_PROJECT_ID=${param:PROJECT_ID}
    • COLLECTION_PATH=organizations/{organizationId}/workspaces
    • DATASET_ID=comms_sandbox_firestore_export
    • DATASET_LOCATION=us-west3
    • LOCATION=us-west3
    • TABLE_ID=workspaces
    • TABLE_PARTITIONING=NONE
    • TIME_PARTITIONING_FIELD_TYPE=omit
    • WILDCARD_IDS=true

Additional configuration information

  • Script name: @firebaseextensions/fs-bq-import-collection
  • Script version: 0.1.14

[REQUIRED] Step 3: Describe the problem

When importing a collection path containing wildcard params using npx @firebaseextensions/fs-bq-import-collection, I expect the imported records to contain a path_params column containing the wildcard path params. Instead, the imported records have a path_params column equal to {} (or an error is thrown, depending on the input). This is the same problem as reported in #947 (which was erroneously closed by #982 even though that PR does not fix this issue).

Note: this only affects the @firebaseextensions/fs-bq-import-collection import script. Incremental changes synced to BigQuery via the firestore-bigquery-export extension have the correct path_params values.

Steps to reproduce:

Attempt to use npx @firebaseextensions/fs-bq-import-collection on a collection group containing wildcards. PR #982 attempted to address this issue by adding an internal resolveWildcardIds() function but that function alone doesn't address this issue. The problem is that resolveWildcardIds() expects to receive a sourceCollectionPath argument in the form regions/{regionId}/countries however the sourceCollectionPath argument is passed directly to firebase.firestore().collectionGroup(sourceCollectionPath) and .collectionGroup() cannot receive a collection ID containing "/" (indeed, an error will be thrown saying "Collection IDs must not contain '/'.").

The best solution is probably to allow Collection IDs containing "/" if the queryCollectionGroup argument is true but then call sourceCollectionPath.split("/").at(-1) to get the last piece of the collection path and pass that to the collectionGroup query.

For example:

query = firebase.firestore().collectionGroup(sourceCollectionPath.split("/").at(-1));
Additionally...

I'll also note that, as an experienced Firestore developer, I knew that collection groups couldn't receive paths in form of regions/{regionId}/countries and, for that reason, the path I originally attempted to provide to the CLI was in the form --source-collection-path=countries --query-collection-group=true. It also didn't hurt that the current documentation for the import script specifically says "You cannot use wildcard notation in the collection path (i.e. /collection/{document}/sub_collection}).".

Providing a "correct" path (i.e. without wildcards) doesn't throw an error but also doesn't properly parse the path_params. It wasn't until I was debugging the problem and creating this issue that I realized I was suppose to provide a path in the form of (regions/{regionId}/countries). So another aspect to this issue is that fact that the import script CLI needs better documentation to call out the fact that the --source-collection-path argument expects a path in the form of regions/{regionId}/countries if --query-collection-group=true. It should also be noted that, despite providing a path in the form of regions/{regionId}/countries, the import script will actually be importing all collections and subscollections named "counties" since that's how collectionGroup queries work (which some might find surprising since they are specifying a specific path like regions/{regionId}/countries).

Expected result

I expect the imported records to contain a path_params column containing the wildcard path params.

Actual result

Instead, the imported records have a path_params column equal to {}.

jorroll avatar Sep 29 '22 08:09 jorroll

FYI, I patched this locally and it appears to fix the problem

I.e.

query = firebase.firestore().collectionGroup(sourceCollectionPath.split("/").at(-1));

jorroll avatar Sep 29 '22 08:09 jorroll

Hey @jorroll, Thank you for raising this issue, it's crystal clear. I added it to the project tracker for further investigation.

yamankatby avatar Sep 29 '22 09:09 yamankatby

We would be happy to have this fix in the tool. Thanks to @jorroll for bringing this up!

BenjaminKlatt avatar Oct 09 '22 21:10 BenjaminKlatt

investigating this now, reproduced it.

cabljac avatar Oct 31 '22 12:10 cabljac

This is very much in need of being fixed. For any application that's on production already this virtually renders BigQuery unusable for historical data. Any workaround to populate path_params column would be of critical help while this is fixed. Thanks!

conceptualben avatar Jul 15 '23 22:07 conceptualben

I've just published a new version of the import script which should fix this, as part of a big refactor. Closing this issue now, let me know if there are any issues :)

cabljac avatar Oct 23 '23 09:10 cabljac

@cabljac The same issue still exists; I tested it and it does exist in the backfill script.

filiocorp avatar May 28 '24 22:05 filiocorp