dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Periodical Snapshotting

Open thevaizman opened this issue 2 years ago • 8 comments

Would love to see auto-snapshotting of the RDB file in DF. Currently I can use SAVE/BGSAVE but the option to configure automatic snapshotting - i.e SAVE 60 1000 is non-existent. Would love to replace redis with DF but I don't want to take the risk of data loss or the overhead of having an external job to take care of periodically dumping the data to disk.

Is there a plan to support this feature in the future?

thevaizman avatar Jun 18 '22 12:06 thevaizman

yes, we can support it, though unlikely we gonna use the same spec as redis.

btw, DF saves timestamped files by default, though it's possible to override it, and use a single snapshot file like with Redis. What would you choose? timestamped files will require some sort of garbage collection configured externally, otherwise you will find yourself out of disk space. In addition, if you run your Redis/DF in a cloud, I might want to upload your rdb snapshots to cloud storage.

romange avatar Jun 18 '22 16:06 romange

Thanks for your response!

I actually DON'T like the redis spec for the periodical snapshotting, I just used it as an example to explain the feature 😄 I would prefer a runtime arg that I can supply to DF that will indicate when to dump to disk (e.g --snapshot-interval=60 will dump every 60 seconds.

In terms of naming the snapshots, I personally choose to override the timestamps as I only want the most recent dump so I run my instance of DF with --dbfilename=dump.rdb. You do make a valid point about saving multiple snapshots in a cloud storage to enable restoring to some point in time, but I'm taking a wild guess here that DF isn't going to actually support the upload operation to the cloud any time soon (and rightly so 😉 ) and so the user will have to spin up some external component that will automate this. Therefore, I don't see any point in using the timestamps on DF's end but just let the user handle the naming externally.

thevaizman avatar Jun 18 '22 16:06 thevaizman

I think that the simplest and most versatile approach would be to adopt a glob-based spec. For example "10:00" would match 10am, but "*:00" would match every hour. I think periodic configuration does not fit the use case, where one wants to snapshot during low-load hours.

romange avatar Jul 09 '22 20:07 romange

Sounds reasonable to me. Although you could argue that during high-load hours there's a bigger risk of losing data due to crashes/failures, which is why I tend to like the periodic configuration of redis. But as long as we can use something like *:00, I believe this is sufficient for most use-cases.

thevaizman avatar Jul 09 '22 21:07 thevaizman

👍🏼

So, the task is:

  1. To introduce a flag save_schedule or similar in server_family.cc
  2. If the flag is not empty, to parse it on a startup and see if it fits the glob spec to match HH:MM 24h time. We probably should not crash on incorrect value but output error log and ignore.
  3. If everything is ok we should start a fiber that sleeps in a loop every 20s. (20s is enough detailed so we could catch every minute when we drift).
  4. once the fiber wakes it should check for the current time and match it with the spec. if it fits, call DoSave() function.
  5. DoSave requires a transaction object. You can create in the calling fiber. See Reload(...) function in debugcmd.cc for example.
  6. I do not see how we can test it easily in unit tests, unfortunately. However, I introduced a pytest framework under tests/pytest. We should add a test there that checks this behavior. However, this item probably depends on #199 .

romange avatar Jul 10 '22 05:07 romange

@romange Is this issue available to take up?

Nike682631 avatar Aug 11 '22 18:08 Nike682631

This one requires deep knowledge of DragonflyDB architecture to do correctly. Lets start with other issues for now.

romange avatar Aug 11 '22 19:08 romange

Sure

Nike682631 avatar Aug 11 '22 19:08 Nike682631

How would you snapshot every 15 minutes using this format?

kaiserdan avatar Aug 10 '23 21:08 kaiserdan

@kaiserdan we just recently introduced a new flag: see https://github.com/dragonflydb/dragonfly/pull/1599 and https://github.com/dragonflydb/dragonfly/issues/1590

we will document it soon, see https://github.com/dragonflydb/documentation/issues/129

romange avatar Aug 11 '23 20:08 romange