backupfire-firebase icon indicating copy to clipboard operation
backupfire-firebase copied to clipboard

Ability to only backup certain documents?

Open Zloka opened this issue 4 years ago • 4 comments

Hi! This is more of a question/feature request.

Under the Import data section, there is this specific line of text:

If a document in your database is not affected by an import, it will remain in your database after the import.

I have a few collections that are a part of a collaborative tool for managing and creating files. Each document is only ever created, it is never updated. Whenever someone makes a change, they create a new document based on the existing document and the latest one is determined using timestamps.

As such, I only ever have to backup each document in my collection(s) once. Do you have any idea if it would be possible to backup only certain documents in a collection based on a timestamp, namely those created after the previous backup? Since my collections are ever-growing, unless I decide to retain only the X latest version, that would be very useful!

Zloka avatar Jun 03 '21 16:06 Zloka

Hey Zloka!

First of all, could you please help me understand your motivation? If I'm reading it right, you're concerned that you would have redundant data stored in backups that never change? Could you please tell me a bit more about what concerns you?

If you know, could you please tell me the number of documents in the collection and how many documents is created every day?

Knowing that will help to address your problem in the best way possible.


Answering your question: it is possible, but there's no easy way to do it. I had an experimental incremental backup in work, but the problem here is to build a bulletproof system. Backing up a complete database ensures its integrity. As it's done via internal Google Cloud processes, so no matter how big it is, we can be sure it's always complete.

Backing up selected documents can be done via querying a collection from a function affected by memory and timeout limits. This would mean the need for multiple function runs and incremental write to Google Cloud.

kossnocorp avatar Jun 04 '21 03:06 kossnocorp

@kossnocorp Hey, thanks for getting back to me so quickly! :)

First of all, could you please help me understand your motivation? If I'm reading it right, you're concerned that you would have redundant data stored in backups that never change? Could you please tell me a bit more about what concerns you?

Sure! As each document never changes, and we create new versions by simply creating new documents, the total amount of documents is ever-growing. Since each backed up (exported) document incurs a read, this effectively means that the cost of backing up is also ever-growing.

However, since a document is never changed, it means that backing the same document up multiple times is actually quite redundant, it would be possible to restore the database from multiple, incremental backups.

If you know, could you please tell me the number of documents in the collection and how many documents is created every day?

In its current state, we are only talking tens of thousands of documents. We took the system into use this week, so I don't have a lot of historical data to go on, but the daily writes seem to (currently) fluctuate between 500-1500.

Scalability is my main concern though. Assuming 20000 documents, a daily backup would cost only $4.38 per annum, so not that bad, but let's (hypothetically) say we hire more people and whip up 3000 new documents a day for a total 1,095,000 new documents in a year, backing up those every day would already cost ~$240 annually. Sure, not a big cost, but it is an ever-increasing sum, and would not be a thing if robust incremental backups were available 🤔

Of course, that math here is a bit naive, but it should illustrate my point.

Zloka avatar Jun 04 '21 04:06 Zloka

I got you! So, the price is the main concern, and as I understand, the current approach is fine and will become a problem only with a significantly increased load.

I have to think about that and research a little more, and get back to you once I have something.

kossnocorp avatar Jun 04 '21 05:06 kossnocorp

@kossnocorp Cool! 🙂

I will, for now, simply make complete backups. In the long run I'll have a need for incremental backups, or to periodically empty out sufficiently old data to keep the amount of documents manageable, but if you do happen to work out an incremental backup feature I would be very interested to use your service! 🙂

Zloka avatar Jun 04 '21 05:06 Zloka