zrepl
zrepl copied to clipboard
option to enable/disable/ pause replication functionality of a job at runtime
I would like a way to ask zrepl to continue taking snapshots but not attempt or permit any pushing or pulling.
I often travel for work, and this involves expensive, low quality internet and many hours between charges.
It would be easy to simply turn off zrepl, but I don't want to lose the incremental snapshots.
However, I don't want zrepl to decide when it is okay or when it isn't: sometimes I am on AC power with cheap, high quality internet -- but with a network policy incompatible with snapshots being transferred. For this reason, I would like a way to send zrepl a signal that it is okay or not.
One way could be a message sent to the daemon to start / stop IO. Another could be a separate process which handles the network activity, leaving the snapshotting to the existing daemon. Not sure. What do you think?
I suppose another way could be to turn on and off a firewall or interface to simply prevent it from using the network.
Thanks for this well-written feature request, I am particularly happy that you explained your use case so well!
It would be easy to simply turn off zrepl, but I don't want to lose the incremental snapshots.
zrepl (at least from upcoming 0.3 forward) guarantees that incremental replication will be always possible unless you fiddle around in the bookmarks managed by it. I guess you are referring to the periodic snapshots + pruning?
However, I don't want zrepl to decide when it is okay or when it isn't: sometimes I am on AC power with cheap, high quality internet -- but with a network policy incompatible with snapshots being transferred. For this reason, I would like a way to send zrepl a signal that it is okay or not.
One way could be a message sent to the daemon to start / stop IO. Another could be a separate process which handles the network activity, leaving the snapshotting to the existing daemon. Not sure. What do you think?
This is something @janisstreib and I have been thinking about a bit lately. I personally would like to see some fancy NetworkManager integration right in zrepl, but your proposal to just have a CLI switch to enable / disable the replication functionality of a job (without affecting the snapshot management) seems useful and quite easy to implement.
Existing Functionality
The zrepl signal wakeup JOB and zrepl signal reset JOB commands obviously don't fully address your use case because they do not set state in the daemon - they merely trigger or cancel a replicate-prune-cycle of a push/pull job.
My suggestion to you is that you configure a snap job that takes care of snapshot management and a push job that just does replication. It feels a bit hacky and will be improved in upcoming releases.
I have a WIP commit to the zrepl documentation that will document such typical laptop/workstation use-cases, but ATM your best bet is to experiment yourself or ask @janisstreib to post his zrepl config here ;)
Sure. My current laptop config for snapshotting all and backuping just the home:
global:
logging:
# use syslog instead of stdout because it makes journald happy
- type: syslog
format: human
level: warn
jobs:
- name: snapjob
type: snap
filesystems: {
"pool<": true,
}
snapshotting:
type: periodic
interval: 15m
prefix: zrepl_
pruning:
keep:
- type: grid
grid: 1x1h(keep=all) | 24x1h | 14x1d
regex: "^zrepl_.*"
- name: <my push job>
send:
encrypted: true
type: push
connect:
type: ssh+stdinserver
host: <my target>
user: root
port: 22
options:
- ProxyJump=<my jumphost>
identity_file: /etc/zrepl/zrepl.key
filesystems: {
"pool/<my_home>": true
}
snapshotting:
type: manual
pruning:
keep_sender:
- type: regex
regex: ".*"
keep_receiver:
- type: grid
grid: 1x1h(keep=all) | 24x1h | 360x1d
regex: "^zrepl_.*"
@grahamc please note that you might want to add additional keep rules to the pruning config if you have non-zrepl-managed snapshots on the dataset that you would like to keep. **Above configuration will destroy all snapshots that don't have the zrepl_ prefix, and apply the grid pruning policy to those that have this prefix. Ref https://zrepl.github.io/configuration/prune.html
I think the main problem with this is the following scenario:
- Use the new functionality to stop replication of a job.
- systemctl restart zrepl
- ???
In (3), is replication allowed again? If not, where should zrepl remember that replication of a job was stopped? We currently don't have any persistent state for a job (outside of replication cursors and stuff like that).
Would there be any downside to just storing the state as a property on the dataset(s)?
Would there be any downside to just storing the state as a property on the dataset(s)?
Yes. First of all: Which datasets? All of them? Pls no. Jobs have a list of datasets (dataset filter), but that's not their identity and can easily change between job runs (user deletes dataset) or restarts (user changes config and/or deletes datasets). Also, datasets can be part of multiple jobs, so we'd need one property per job.
I'd much, much prefer an entirely machine-managed, best-effort parsed-or-discarded, /etc/zrepl/state.json to persist this. It's not perfect, but already a better basis for discussion, IMHO.
2. systemctl restart zreplis replication allowed again?
How about a classic "whatever you want"?
persist_jobstate: true # defaults to false