node-firestore-backup
node-firestore-backup copied to clipboard
Research possibility of running Cloud Function into Cloud Storage
@yoiang whats the thinking here? In a func you would use this lib to backup your firestore db to a gcs file?
That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document.
I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc?
Let me know if I diverge from the original idea.
In the context of cloud functions, this is the perfect fan-out model. And I use it all the time during normal operation of Firestore. (Denormalization, near-time backup to BigQuery)
Using pubsub a message is published with the attributes containing the class and method to be called and the message payload being the class.method params. So in this functional model you would treat each method as a functional rpc call.
$ backup • send pubsub -> backup.getCollections • receive pubsub -> doThings -> pubsub -> backup.getDocs
You can break it out as far as you want really. In addition, the cli could technically do all the setup required — create pubsub topic, publish funcs to handle pubsub messages, etc. On Sun, Feb 4, 2018 at 10:16 AM Ian G [email protected] wrote:
That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document.
I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/steadyequipment/node-firestore-backup/issues/4#issuecomment-362918827, or mute the thread https://github.com/notifications/unsubscribe-auth/AEyA3nnitUe_v1wEPIb6JNYMewcbwceFks5tRdfHgaJpZM4QQI0I .
-- google is watching
@yoiang this is kinda what I was thinking in regards to uploading to GCS. Because the backup flow is serial the time to backup the db is much longer, but we can tackle parallelization next. forked commit
Yah, I agree that local parallelization (as opposed to what the remote parallelization we're discussing) should be the next task along with further work towards restoring.
@yoiang have you made progress here?
I was thinking of implementing parallelization by having the cli call itself and passing a document path for context.
Seems like this will allow a good amount of reuse and the ability in the future to offload the work via a different mechanism later.
Thoughts?