Ability to run scripts periodically inside the existing jobs.
Proposal
A mechanism to run scripts in a task periodically or cron based. For example, if when running a postgres task, we need a way to dump databases periodically for backup.
Use-cases
One particular use case is to create database dumps for backups.
Attempted Solutions
Recently, change scripts has been introduced in the templates. One hacky way to do this is to, put up a change_script re-render the template periodically so that the script is executed.
Prior art(s)
- https://awesomeprogrammer.com/blog/2022/06/05/how-to-backup-postgres-database-with-nomad/
- https://andydote.co.uk/2021/11/22/nomad-operator-pattern/
Hi @blmhemu that's an interesting idea, especially since as you point out Nomad can already run scripts in the context of individual tasks. I do wonder though if there's substantial use-cases that couldn't be covered by the already-existing support for periodic jobs - which unlike these scripts would have the full capabilities of the Nomad scheduler behind them.
Hi @shoenig, added a prior art in the post. It uses a somewhat similar mechanism of using existing periodic job type.
@blmhemu Right now we also use a rather hacky approach that involves special periodic jobs running commands against selected allocations using nomad alloc exec. Doable but somewhat tedious.
Having a periodic {} stanza for individual tasks in a task group or perhaps a periodic hook for lifecycle {} would probably be a better option :grin:
Some of the use-cases we try cover with this approach are:
- consistent database backups.
- running database maintenance jobs.
- running maintenance jobs on certain CSI volumes (removing / compressing / archiving data, etc).
- vulnerability and/or compliance scans of running allocations.
- etc
Would love to know your impl details, if possible. Do you use exec or raw_exec drivers ?
Just jotting down some thoughts on this notion of "local" periodic tasks in this sense:
- The client agent can stop for arbitrary amounts of time and the tasks for an allocation are left running. When the agent comes back up, there's no way for us to know if the clock has skewed while we're down, so even if we recorded last-run we'd be at risk of having incorrect periodicity. (Which might be ok if we were to document it as "do not rely on this for accurate timing!!!")
- Likewise, what happens if the interval passes while the client is down?
- The scheduler doesn't start tasks, it creates allocations that are then turned into running tasks by the client. So although the server could drive the periodicity, we don't currently have a way of dispatching task-level work. But we might be able to repurpose something like the
alloc restart -taskto get the actual task running. - If we have a periodic sidecar task, we'd need to account for the resources used by the task when scheduling, which would mean any allocation with one of these sidecar tasks would be using up memory that it's not actually using when the periodic sidecar is running.
- To avoid the resource usage, we'd have to run a tasklet like we do for
scriptchecks (as @shoenig noted).
So if you're ok with not having precision because of client restarts and you want the script to run inside the task resources, typically what you want to do is set up a script check with a long-running interval. That'll give you the added bonus of reporting to Consul/Nomad service discovery that the script was healthy on its last run.
Somewhat related: https://github.com/hashicorp/nomad/issues/2617
Can we close this or 2617 to concentrate discussion in one place?
@EugenKon they're two different feature requests. This is for running a script as an exec inside a running task, whereas #2617 is about running a subset of tasks as periodic tasks.