zrepl option to enable/disable/ pause replication functionality of a job at runtime

I would like a way to ask zrepl to continue taking snapshots but not attempt or permit any pushing or pulling.

I often travel for work, and this involves expensive, low quality internet and many hours between charges.

It would be easy to simply turn off zrepl, but I don't want to lose the incremental snapshots.

However, I don't want zrepl to decide when it is okay or when it isn't: sometimes I am on AC power with cheap, high quality internet -- but with a network policy incompatible with snapshots being transferred. For this reason, I would like a way to send zrepl a signal that it is okay or not.

One way could be a message sent to the daemon to start / stop IO. Another could be a separate process which handles the network activity, leaving the snapshotting to the existing daemon. Not sure. What do you think?

Jun 13 '20 01:06 grahamc

I suppose another way could be to turn on and off a firewall or interface to simply prevent it from using the network.

Jun 13 '20 01:06 grahamc

Thanks for this well-written feature request, I am particularly happy that you explained your use case so well!

It would be easy to simply turn off zrepl, but I don't want to lose the incremental snapshots.

zrepl (at least from upcoming 0.3 forward) guarantees that incremental replication will be always possible unless you fiddle around in the bookmarks managed by it. I guess you are referring to the periodic snapshots + pruning?

However, I don't want zrepl to decide when it is okay or when it isn't: sometimes I am on AC power with cheap, high quality internet -- but with a network policy incompatible with snapshots being transferred. For this reason, I would like a way to send zrepl a signal that it is okay or not.

One way could be a message sent to the daemon to start / stop IO. Another could be a separate process which handles the network activity, leaving the snapshotting to the existing daemon. Not sure. What do you think?

This is something @janisstreib and I have been thinking about a bit lately. I personally would like to see some fancy NetworkManager integration right in zrepl, but your proposal to just have a CLI switch to enable / disable the replication functionality of a job (without affecting the snapshot management) seems useful and quite easy to implement.

Existing Functionality

The zrepl signal wakeup JOB and zrepl signal reset JOB commands obviously don't fully address your use case because they do not set state in the daemon - they merely trigger or cancel a replicate-prune-cycle of a push/pull job.

My suggestion to you is that you configure a snap job that takes care of snapshot management and a push job that just does replication. It feels a bit hacky and will be improved in upcoming releases. I have a WIP commit to the zrepl documentation that will document such typical laptop/workstation use-cases, but ATM your best bet is to experiment yourself or ask @janisstreib to post his zrepl config here ;)

Jun 13 '20 10:06 problame

Sure. My current laptop config for snapshotting all and backuping just the home:

global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn
jobs:
- name: snapjob
  type: snap
  filesystems: {
    "pool<": true,
  }
  snapshotting:
    type: periodic
    interval: 15m
    prefix: zrepl_
  pruning:
    keep:
      - type: grid
        grid: 1x1h(keep=all) | 24x1h | 14x1d
        regex: "^zrepl_.*"
- name: <my push job>
  send:
    encrypted: true
  type: push
  connect:
    type: ssh+stdinserver
    host: <my target>
    user: root
    port: 22
    options:
      - ProxyJump=<my jumphost>
    identity_file: /etc/zrepl/zrepl.key
  filesystems: {
    "pool/<my_home>": true
  }
  snapshotting:
    type: manual
  pruning:
    keep_sender:
    - type: regex
      regex: ".*"
    keep_receiver:
      - type: grid
        grid: 1x1h(keep=all) | 24x1h | 360x1d
        regex: "^zrepl_.*"

Jun 13 '20 16:06 janisstreib

@grahamc please note that you might want to add additional keep rules to the pruning config if you have non-zrepl-managed snapshots on the dataset that you would like to keep. **Above configuration will destroy all snapshots that don't have the zrepl_ prefix, and apply the grid pruning policy to those that have this prefix. Ref https://zrepl.github.io/configuration/prune.html

Jun 14 '20 09:06 problame

I think the main problem with this is the following scenario:

Use the new functionality to stop replication of a job.
systemctl restart zrepl
???

In (3), is replication allowed again? If not, where should zrepl remember that replication of a job was stopped? We currently don't have any persistent state for a job (outside of replication cursors and stuff like that).

Feb 05 '22 10:02 problame

Would there be any downside to just storing the state as a property on the dataset(s)?

Feb 05 '22 20:02 cole-h

Would there be any downside to just storing the state as a property on the dataset(s)?

Yes. First of all: Which datasets? All of them? Pls no. Jobs have a list of datasets (dataset filter), but that's not their identity and can easily change between job runs (user deletes dataset) or restarts (user changes config and/or deletes datasets). Also, datasets can be part of multiple jobs, so we'd need one property per job.

I'd much, much prefer an entirely machine-managed, best-effort parsed-or-discarded, /etc/zrepl/state.json to persist this. It's not perfect, but already a better basis for discussion, IMHO.

2. systemctl restart zrepl
is replication allowed again?

How about a classic "whatever you want"?

persist_jobstate: true # defaults to false

Feb 06 '22 03:02 InsanePrawn

zrepl zrepl copied to clipboard

option to enable/disable/ pause replication functionality of a job at runtime

zrepl
zrepl copied to clipboard