dragonfly
dragonfly copied to clipboard
Periodical Snapshotting
Would love to see auto-snapshotting of the RDB file in DF.
Currently I can use SAVE/BGSAVE
but the option to configure automatic snapshotting - i.e SAVE 60 1000
is non-existent.
Would love to replace redis with DF but I don't want to take the risk of data loss or the overhead of having an external job to take care of periodically dumping the data to disk.
Is there a plan to support this feature in the future?
yes, we can support it, though unlikely we gonna use the same spec as redis.
btw, DF saves timestamped files by default, though it's possible to override it, and use a single snapshot file like with Redis. What would you choose? timestamped files will require some sort of garbage collection configured externally, otherwise you will find yourself out of disk space. In addition, if you run your Redis/DF in a cloud, I might want to upload your rdb snapshots to cloud storage.
Thanks for your response!
I actually DON'T like the redis spec for the periodical snapshotting, I just used it as an example to explain the feature 😄
I would prefer a runtime arg that I can supply to DF that will indicate when to dump to disk (e.g --snapshot-interval=60
will dump every 60 seconds.
In terms of naming the snapshots, I personally choose to override the timestamps as I only want the most recent dump so I run my instance of DF with --dbfilename=dump.rdb
.
You do make a valid point about saving multiple snapshots in a cloud storage to enable restoring to some point in time, but I'm taking a wild guess here that DF isn't going to actually support the upload operation to the cloud any time soon (and rightly so 😉 ) and so the user will have to spin up some external component that will automate this. Therefore, I don't see any point in using the timestamps on DF's end but just let the user handle the naming externally.
I think that the simplest and most versatile approach would be to adopt a glob-based spec. For example "10:00" would match 10am, but "*:00" would match every hour. I think periodic configuration does not fit the use case, where one wants to snapshot during low-load hours.
Sounds reasonable to me. Although you could argue that during high-load hours there's a bigger risk of losing data due to crashes/failures, which is why I tend to like the periodic configuration of redis.
But as long as we can use something like *:00
, I believe this is sufficient for most use-cases.
👍🏼
So, the task is:
- To introduce a flag
save_schedule
or similar inserver_family.cc
- If the flag is not empty, to parse it on a startup and see if it fits the glob spec to match
HH:MM
24h time. We probably should not crash on incorrect value but output error log and ignore. - If everything is ok we should start a fiber that sleeps in a loop every 20s. (20s is enough detailed so we could catch every minute when we drift).
- once the fiber wakes it should check for the current time and match it with the spec. if it fits, call
DoSave()
function. - DoSave requires a transaction object. You can create in the calling fiber. See
Reload(...)
function in debugcmd.cc for example. - I do not see how we can test it easily in unit tests, unfortunately. However, I introduced a pytest framework under
tests/pytest
. We should add a test there that checks this behavior. However, this item probably depends on #199 .
@romange Is this issue available to take up?
This one requires deep knowledge of DragonflyDB architecture to do correctly. Lets start with other issues for now.
Sure
How would you snapshot every 15 minutes using this format?
@kaiserdan we just recently introduced a new flag: see https://github.com/dragonflydb/dragonfly/pull/1599 and https://github.com/dragonflydb/dragonfly/issues/1590
we will document it soon, see https://github.com/dragonflydb/documentation/issues/129