k8up icon indicating copy to clipboard operation
k8up copied to clipboard

Global Prune and Check Schedule per Repository

Open tobru opened this issue 3 years ago • 1 comments

Summary

As K8up administrator I want the restic jobs prune and check to be scheduled once per restic repository on a certain schedule So that no overlapping jobs are running

Context

From a user's perspective, it makes no sense to define Prune and Check on their own. All the users should care about, is that the minimum retention for backups is guaranteed. Prunes and Checks are exclusive maintenance jobs that can be left to K8up itself.

As a consequence, the idea is to have a single schedules per restic repository, not per Schedule. Define global, configurable cron schedules, one for Prunes and one for Checks. These jobs get scheduled once per restic repository for the whole cluster. In other words, there can be many Backups targeting the same repository, but only 1 Prune and 1 Check.

Impact

  • Users would not need to define Prune and Check spec in the Schedule anymore. Though they can if they want to run an additional Prune or Check.
  • K8up Administrators basically define a Prune and Check for every repository, whether the users want to or not.
  • The prune or check job will be scheduled in an arbitrary namespace, though it's always one where the respective Schedule is.

Out of Scope

  • This is to some extent an alternative implementation of #344

Further links

  • tbd

Acceptance criteria

  • Given a new Schedule object with a new restic repository, when reconciling it, then schedule a job for the specified repository.
  • Given an existing Schedule object, when deleting it, then remove the cron schedule if it's the last repository of the same kind.
  • Given an existing Schedule object, when the repository changes, then ensure that the new repository also gets prunes/checked and the old one is removed.
  • Given a Schedule object that defines Prune or Check on its own, when reconciling it, then schedule those jobs in addition to the global jobs.

Implementation Ideas

  • Given that EffectiveSchedules were introduced in v1.0 as a preparation for deduplication, we can remove this again and put the effectiveSchedules into the Status fields.
  • At first it looks very much like deduplication in #344 with the only difference that the schedule is coming from a global setting. However, in this case I would propose to keep the deduplication all within K8up itself with the internal scheduler. There we wouldn't be somewhat constraint by CRDs data structure.
  • In order to balance Prunes and Checks within the cluster, we could only allow randomized schedules for those, and use the repository string as the seed. In that way, @weekly-random for example ensures that not 200 Prunes get scheduled at the same time, but they are distributed over the cluster. E.g. repository A is on a Tuesday night, repository B on a Saturday morning. Since the randomizer is using a seed, the schedule should stay stable over K8up restarts.
  • To internally track which Prunes and Checks to run, we could use 2 map[string][]types.NamespacedName data object (one for checks, one for prunes). The keys are the repository strings, and the name array are the references to Schedules that have configured this repo. The scheduler would only need to run jobs in the namespace of the first entry of that array.

tobru avatar Feb 25 '21 15:02 tobru

@schnitzel

Thanks for the call today! I have refined this issue based on what we discussed today. Would you mind to read this carefully and give a feedback?

ccremer avatar Feb 25 '21 16:02 ccremer