node-firestore-backup icon indicating copy to clipboard operation
node-firestore-backup copied to clipboard

Research possibility of running Cloud Function into Cloud Storage

Open yoiang opened this issue 8 years ago • 6 comments
trafficstars

yoiang avatar Nov 02 '17 18:11 yoiang

@yoiang whats the thinking here? In a func you would use this lib to backup your firestore db to a gcs file?

jeremylorino avatar Feb 04 '18 05:02 jeremylorino

That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document.

I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc?

yoiang avatar Feb 04 '18 16:02 yoiang

Let me know if I diverge from the original idea.

In the context of cloud functions, this is the perfect fan-out model. And I use it all the time during normal operation of Firestore. (Denormalization, near-time backup to BigQuery)

Using pubsub a message is published with the attributes containing the class and method to be called and the message payload being the class.method params. So in this functional model you would treat each method as a functional rpc call.

$ backup • send pubsub -> backup.getCollections • receive pubsub -> doThings -> pubsub -> backup.getDocs

You can break it out as far as you want really. In addition, the cli could technically do all the setup required — create pubsub topic, publish funcs to handle pubsub messages, etc. On Sun, Feb 4, 2018 at 10:16 AM Ian G [email protected] wrote:

That's one possibility! Another that comes to mind is to do the (currently little bit) of processing done on each document.

I'm honestly not well acquainted with it yet so I don't know its limitations. For example, would it be possible to spawn additional processes to divide the work of querying and recording collections, fork again on subcollections, and sub-subcollections, etc?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/steadyequipment/node-firestore-backup/issues/4#issuecomment-362918827, or mute the thread https://github.com/notifications/unsubscribe-auth/AEyA3nnitUe_v1wEPIb6JNYMewcbwceFks5tRdfHgaJpZM4QQI0I .

-- google is watching

jeremylorino avatar Feb 04 '18 17:02 jeremylorino

@yoiang this is kinda what I was thinking in regards to uploading to GCS. Because the backup flow is serial the time to backup the db is much longer, but we can tackle parallelization next. forked commit

jeremylorino avatar Feb 04 '18 22:02 jeremylorino

Yah, I agree that local parallelization (as opposed to what the remote parallelization we're discussing) should be the next task along with further work towards restoring.

yoiang avatar Feb 28 '18 17:02 yoiang

@yoiang have you made progress here?

I was thinking of implementing parallelization by having the cli call itself and passing a document path for context.

Seems like this will allow a good amount of reuse and the ability in the future to offload the work via a different mechanism later.

Thoughts?

jeremylorino avatar Mar 04 '18 00:03 jeremylorino