extensions
extensions copied to clipboard
`path_params` is not included when importing firestore docs to BigQuery
[REQUIRED] Step 2: Describe your configuration
- Extension name: firestore-bigquery-export
- Extension version: 0.1.25
- Configuration values (redact info where appropriate):
- BIGQUERY_PROJECT_ID=${param:PROJECT_ID}
- COLLECTION_PATH=organizations/{organizationId}/workspaces
- DATASET_ID=comms_sandbox_firestore_export
- DATASET_LOCATION=us-west3
- LOCATION=us-west3
- TABLE_ID=workspaces
- TABLE_PARTITIONING=NONE
- TIME_PARTITIONING_FIELD_TYPE=omit
- WILDCARD_IDS=true
Additional configuration information
- Script name:
@firebaseextensions/fs-bq-import-collection - Script version: 0.1.14
[REQUIRED] Step 3: Describe the problem
When importing a collection path containing wildcard params using npx @firebaseextensions/fs-bq-import-collection, I expect the imported records to contain a path_params column containing the wildcard path params. Instead, the imported records have a path_params column equal to {} (or an error is thrown, depending on the input). This is the same problem as reported in #947 (which was erroneously closed by #982 even though that PR does not fix this issue).
Note: this only affects the @firebaseextensions/fs-bq-import-collection import script. Incremental changes synced to BigQuery via the firestore-bigquery-export extension have the correct path_params values.
Steps to reproduce:
Attempt to use npx @firebaseextensions/fs-bq-import-collection on a collection group containing wildcards. PR #982 attempted to address this issue by adding an internal resolveWildcardIds() function but that function alone doesn't address this issue. The problem is that resolveWildcardIds() expects to receive a sourceCollectionPath argument in the form regions/{regionId}/countries however the sourceCollectionPath argument is passed directly to firebase.firestore().collectionGroup(sourceCollectionPath) and .collectionGroup() cannot receive a collection ID containing "/" (indeed, an error will be thrown saying "Collection IDs must not contain '/'.").
The best solution is probably to allow Collection IDs containing "/" if the queryCollectionGroup argument is true but then call sourceCollectionPath.split("/").at(-1) to get the last piece of the collection path and pass that to the collectionGroup query.
For example:
query = firebase.firestore().collectionGroup(sourceCollectionPath.split("/").at(-1));
Additionally...
I'll also note that, as an experienced Firestore developer, I knew that collection groups couldn't receive paths in form of regions/{regionId}/countries and, for that reason, the path I originally attempted to provide to the CLI was in the form --source-collection-path=countries --query-collection-group=true. It also didn't hurt that the current documentation for the import script specifically says "You cannot use wildcard notation in the collection path (i.e. /collection/{document}/sub_collection}).".
Providing a "correct" path (i.e. without wildcards) doesn't throw an error but also doesn't properly parse the path_params. It wasn't until I was debugging the problem and creating this issue that I realized I was suppose to provide a path in the form of (regions/{regionId}/countries). So another aspect to this issue is that fact that the import script CLI needs better documentation to call out the fact that the --source-collection-path argument expects a path in the form of regions/{regionId}/countries if --query-collection-group=true. It should also be noted that, despite providing a path in the form of regions/{regionId}/countries, the import script will actually be importing all collections and subscollections named "counties" since that's how collectionGroup queries work (which some might find surprising since they are specifying a specific path like regions/{regionId}/countries).
Expected result
I expect the imported records to contain a path_params column containing the wildcard path params.
Actual result
Instead, the imported records have a path_params column equal to {}.
FYI, I patched this locally and it appears to fix the problem
I.e.
query = firebase.firestore().collectionGroup(sourceCollectionPath.split("/").at(-1));
Hey @jorroll, Thank you for raising this issue, it's crystal clear. I added it to the project tracker for further investigation.
We would be happy to have this fix in the tool. Thanks to @jorroll for bringing this up!
investigating this now, reproduced it.
This is very much in need of being fixed. For any application that's on production already this virtually renders BigQuery unusable for historical data. Any workaround to populate path_params column would be of critical help while this is fixed. Thanks!
I've just published a new version of the import script which should fix this, as part of a big refactor. Closing this issue now, let me know if there are any issues :)
@cabljac The same issue still exists; I tested it and it does exist in the backfill script.