blackbox icon indicating copy to clipboard operation
blackbox copied to clipboard

🎠 Support any rotation strategy

Open lemonsaurus opened this issue 10 months ago • 0 comments

🩲 Brief

Currently, we have only one rotation strategy available - retain n days of backups.

We should implement support for at least a few other common rotation strategies:

  • Retain latest weekly backup
  • Retain latest monthly backup

It's very common to want to keep n days of backup, plus a weekly and a monthly so that you have the option of restoring an older backup in case your problem exists in the more recent ones because you were not able to restore within n days.

But why stop there?

🌐 Global vs local retention

Currently, retention_days is configured globally across all storage types. This is not really necessary, because each storage handler has its own rotate() and could support individual rotation strategies too. So maybe our user wants to do backups on both Dropbox and S3, but while his S3 has basically limitless space (because S3 is stupidly cheap), the Dropbox only has 100GB and our user wants to be a little bit more conservative with how much space is used there.

So, why shouldn't we let our user configure a big fat rotation policy of keeping 30+ backups on S3, but only keep 3 days worth on Dropbox?

🤓 Universal language for frequency

So, how can we support all of this, but without adding a ton of complexity to our config file? Thankfully, there is a simple solution, because this is pretty much a solved problem. We can just use cron schedule expressions. That'll let us succinctly express any frequency you want to keep backups at, right?

storage:
  s3: # Storage type
    main_s3: # Storage identifier
      rotation_strategies:
        - * 4 * * *     # Keep any backup made between 04:00 and 04:59
        - * 0-6 * * 0   # Keep any backup made on a Sunday between 00:00 and 06:00
        - * * 1 1 *     # Keep any backup made on the 1st of January
        - * * * 1 *     # Keep any backup made on the 1st of any month

Weeelll.. This almost works, but it's missing a crucial bit of information. What if I only want to keep the latest three sunday backups, for example? With cron expressions, I could only express that I want to keep sunday backups, but not how many!

So, let's bastardize cron expressions, and add a sixth parameter, which will represent the number of backups you want to retain. If set to natural numbers of 1 and up, this is the number of backups it will retain of this schedule. If set to 0, it will explicitly not retain anything that matches this - so now we have a mechanism for negation, too!

I present to you: cronbox expressions.

rotation_strategies:
  - * 4 * * * *         # Keep any backup made between 04:00 and 04:59
  - * 0-6 * * 0 1       # Keep 1 backup made on a Sunday between 00:00 and 06:00
  - * * */2 1 * 5       # Keep the latest 5 of any backup made on any even day of January (2nd, 4th ..)
  - 5,10 2 * 1,2 * 0    # Delete any backup made on the 1st or 2nd of any month at either 02:05 or 02:10 AM

Look at all that beautiful complexity! Now our users can truly retain any backups their little hearts desire.

✅ Okay, so what do we do?

Listen, Linda, it's simple. We just:

  • [ ] Continue supporting retention_days as a global setting, for backwards compatibility purposes. e.g., if retention_days is set to 7, 7 days of backup will always be kept, no matter what other strategies are configured.
  • [ ] Implement support for additional strategies at the Storage config level.
    • [ ] This is configured under storage/<storage_type>/<storage_identifier>/rotation_strategies
    • [ ] This configuration parameter takes a list of cron schedule expressions
    • [ ] For each cron schedule expression, resolve the expression into a strategy we can use to identify which backups we will keep.
    • [ ] Evaluate the backup storage directory using each of these additional optional strategies, as well as retention_days, in order to create an allowlist of which backups to keep.
    • [ ] Delete all backups that are not in this allowlist.
    • [ ] Support this for all current Storage handlers.
  • [ ] Add some beautiful ✨d o c u m e n t a t i o n ✨ on how this works in the README.

lemonsaurus avatar Feb 20 '25 20:02 lemonsaurus