🐛 [firestore-bigquery-export] Wrong reference path passed to firestore doc
Describe your configuration
- Extension name: firestore-bigquery-export
- Extension version: 0.1.24
- Configuration values (redact info where appropriate):
- Firebase project ID: mtp-dev-001
- BigQuery project ID: mtp-dev-001
- Firestore collection path: Users/{userid}/Entries
- Use Collection Group query: Yes
- BigQuery dataset ID: firestore_export
- BigQuery table prefix: userentries
- Documents per import batch: 300
- BigQuery dataset location: us
- Use multithreaded import: Yes
- Use optimized snapshot query: Yes
- Transform function URL: (None)
- Use local Firestore emulator: No
- Failed import output location: (None)
Describe the problem
Steps to reproduce:
I had pre-existing collections in Firestore, and while trying to import them using GCP Shell, I faced the error down below, I tried with different settings, and the same error keeps occurring. The error happens for each document, and no output is resolved.
Expected result
The documents should be imported into the destinated BigQuery table.
Actual result
{"severity":"INFO","message":"BigQuery dataset already exists: firestore_export"}
{"severity":"WARNING","message":"Did not add partitioning to schema: Partitioning not enabled"}
{"severity":"INFO","message":"Clustering removed on userentries_raw_changelog"}
{"severity":"INFO","message":"Created BigQuery table: userentries_raw_changelog"}
{"severity":"WARNING","message":"Error caught creating table Provided Schema does not match Table mtp-dev-001:firestore_export.userentries_raw_latest. Field path_params is missing in new schema"}
Wait a few seconds for the dataset to initialize...
Importing data from Cloud Firestore Collection (via a Collection Group query): Users/{userid}/Entries, to BigQuery Dataset: firestore_export, Table: userentries_raw_changelog
(node:4445) AutopaginateTrueWarning: Autopaginate will always be set to false in stream paging methods. See more info at https://github.com/googleapis/gax-nodejs/blob/main/client-libraries.md#auto-pagination for more information on how to configure paging calls
(Use `node --trace-warnings ...` to show where the warning was created)
An error has occurred on the following documents, please re-run or insert the following query documents manually... {"endAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/0HP7GEgbOXU7hk29x6GJhWJt73z2/Entries/Zqi8SsjVmHLOVQnNxiON","valueType":"referenceValue"}]}}
Error: Value for argument "documentPath" must point to a document, but was "projects/mtp-dev-001/databases/(default)/documents/Users/0HP7GEgbOXU7hk29x6GJhWJt73z2/Entries/Zqi8SsjVmHLOVQnNxiON". Your path does not contain an even number of components.
at Firestore.doc (/home/mygcpuser/node_modules/@google-cloud/firestore/build/src/index.js:702:19)
at AsyncFunction.processDocuments (/home/mygcpuser/node_modules/@firebaseextensions/fs-bq-import-collection/lib/worker.js:41:54)
at MessagePort.<anonymous> (/home/mygcpuser/node_modules/workerpool/src/worker.js:157:27)
at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
An error has occurred on the following documents, please re-run or insert the following query documents manually... {"startAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/0HP7GEgbOXU7hk29x6GJhWJt73z2/Entries/Zqi8SsjVmHLOVQnNxiON","valueType":"referenceValue"}]},"endAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/1600XwSJ4eeVkBigm1nS/Entries/nfUTcslwh0nFQuN6RbmQ","valueType":"referenceValue"}]}}
Error: Value for argument "documentPath" must point to a document, but was "projects/mtp-dev-001/databases/(default)/documents/Users/0HP7GEgbOXU7hk29x6GJhWJt73z2/Entries/Zqi8SsjVmHLOVQnNxiON". Your path does not contain an even number of components.
at Firestore.doc (/home/mygcpuser/node_modules/@google-cloud/firestore/build/src/index.js:702:19)
at AsyncFunction.processDocuments (/home/mygcpuser/node_modules/@firebaseextensions/fs-bq-import-collection/lib/worker.js:37:52)
at MessagePort.<anonymous> (/home/mygcpuser/node_modules/workerpool/src/worker.js:157:27)
at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
An error has occurred on the following documents, please re-run or insert the following query documents manually... {"startAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/1600XwSJ4eeVkBigm1nS/Entries/nfUTcslwh0nFQuN6RbmQ","valueType":"referenceValue"}]},"endAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/1u9YyfUpVKudATpd8ql1/Entries/4WiwAWsaC14lRYnoXAMC","valueType":"referenceValue"}]}}
Error: Value for argument "documentPath" must point to a document, but was "projects/mtp-dev-001/databases/(default)/documents/Users/1600XwSJ4eeVkBigm1nS/Entries/nfUTcslwh0nFQuN6RbmQ". Your path does not contain an even number of components.
at Firestore.doc (/home/mygcpuser/node_modules/@google-cloud/firestore/build/src/index.js:702:19)
at AsyncFunction.processDocuments (/home/mygcpuser/node_modules/@firebaseextensions/fs-bq-import-collection/lib/worker.js:37:52)
at MessagePort.<anonymous> (/home/mygcpuser/node_modules/workerpool/src/worker.js:157:27)
at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
An error has occurred on the following documents, please re-run or insert the following query documents manually... {"startAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/1u9YyfUpVKudATpd8ql1/Entries/4WiwAWsaC14lRYnoXAMC","valueType":"referenceValue"}]},"endAt":{"before":true,"values":[{"referenceValue":"projects/mtp-dev-001/databases/(default)/documents/Users/1u9YyfUpVKudATpd8ql1/Entries/VsCbbUyQ2Reh7RLYenWG","valueType":"referenceValue"}]}}
Error: Value for argument "documentPath" must point to a document, but was "projects/mtp-dev-001/databases/(default)/documents/Users/1u9YyfUpVKudATpd8ql1/Entries/4WiwAWsaC14lRYnoXAMC". Your path does not contain an even number of components.
at Firestore.doc (/home/mygcpuser/node_modules/@google-cloud/firestore/build/src/index.js:702:19)
at AsyncFunction.processDocuments (/home/mygcpuser/node_modules/@firebaseextensions/fs-bq-import-collection/lib/worker.js:37:52)
at MessagePort.<anonymous> (/home/mygcpuser/node_modules/workerpool/src/worker.js:157:27)
at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
Hi, i'm reviewing your PR - it's strange though i wasn't able to reproduce this issue. I will continue to try to repro and keep you updated.
Hello! Any news on this issue? I seem to be experiencing the same. I've got the same structure as @YounesAmalou and the only input I got different is the use of optimized snapshot query script. Will appreciate any update, thanks!!
I've noticed that the referenceValue in the serializableQuery has a full path reference that looks similarly to
projects/{project_id}/databases/{database_id}/documents/{document_path}
I couldn't find the initiation of this value nor the documentation for it besides this reference (referenceValue)
but for the doc(path) method, the path param looks similarly to the {document_path} found in the full path.
For testing, I have directly applied this script inside the Google Cloud Shell and followed the steps according to the documentation.
I tried to run the script the first time and the error showed up, then I tried to patch directly the dist build code for the script and it worked!
Then I needed the script for another collection (this time not a sub-collection) and it showed the same error until I patched it to keep going with my task.
Thanks for the detailed explanation on the PR, @YounesAmalou — I tried the patch and it finally worked!
However, I’m now running into a different issue: not all user documents are being imported into BigQuery. Have you experienced something similar? I’ve checked the logs but haven’t been able to find any indication of a limit on the number of documents or any related restriction. Any insight would be greatly appreciated!
@angelabhouse sorry to hear that, I haven't crossed this issue before.
However, for reassurance, could you how did you find that the documents don't match?
What I did on my side is:
- Running a query on BigQuery to COUNT(1) the
latesttable. - Inside the Query Builder from the Firebase interface, I retrieved the count of the documents with the same path of the query.
- Compared them and found them matching.
Also, you mentioned that you have tried the different optimized snapshot query script on the script params, did you try the other method?
Otherwise, I would suggest creating an issue.
Meanwhile, if you want to continue debugging to find the root cause, then try to find the loop and check its length to figure whether the problem might be caused by the query or something related to the environment.
I have the same issue. The script fail to import subcollections
Just bumping this because I ran into this issue, and verified that the fix proposed in https://github.com/firebase/extensions/pull/2437 solves the problem.
Hi all, i've flagged this for more investigation, and will provide updates when available!
I believe I may have accidentally released a fix for this, forgetting that this PR was opened. I will review to confirm. If so, i'll make sure that @YounesAmalou is properly accredited for their contribution.
To confirm version 0.1.26 of @firebaseextensions/fs-bq-import-collection no longer has this issue.