zrepl icon indicating copy to clipboard operation
zrepl copied to clipboard

ssh push job example

Open mdimura opened this issue 3 years ago • 3 comments

I am trying to setup zrepl over ssh, with a push+sink and can't quite make it work. I've configured the sender with ssh+stdinserver:

jobs:
- name: vms_to_central
  type: push
  connect:
    type: ssh+stdinserver
    host: 192.168.10.11
    user: root
    port: 22
    identity_file: /root/.ssh/id_rsa
...      

and receiver:

jobs:
- name: dus_vms_sink
  type: sink
  root_fs: "backups"
  serve:
    type: stdinserver
    client_identities:
    - "<sender_hostname>"

I guess, there is some obvious mistake in the config, but I can't see it. It works with tcp, but I would prefer ssh. Sorry if there is an answer in the docs, I could not find it. Is it possible to setup push+sink config via ssh?

mdimura avatar Mar 12 '21 15:03 mdimura

Please provide the error messages and/or log output that you are encountering.

Is it possible to make setup push+sink config via ssh?

Yes, it's definitely possible.

Please familiarize yourself with https://zrepl.github.io/configuration/transports.html#ssh-stdinserver-transport

Also, a common problem is that the known_hosts entry must already exist.

problame avatar Mar 13 '21 10:03 problame

Thank you for your response. I initially used the ssh-stdinserver-transport example from the docs as the starting point and adapted it from source+pull scheme to push+sink. My guess was, that I messed up with adaptation. Here is the output of systemctl status zrepl:

Mar 15 11:03:28 sender-hostname zrepl[21145]: [vms_to_central][rpc.ctrl][kNoL$aK0q$aK0q.cAaS.4gSr]: cannot connect err="dial_timeout of 10s exceeded" Mar 15 11:03:28 sender-hostname zrepl[21145]: [vms_to_central][repl][kNoL$aK0q$aK0q.cAaS.4gSr.TUvH./XUf]: error listing receiver filesystems err="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial_timeout of 10s exceeded"" errType="*status.statusError" Mar 15 11:03:28 sender-hostname zrepl[21145]: [vms_to_central][repl][kNoL$aK0q$aK0q.cAaS.4gSr.TUvH]: most recent error in this attempt attempt_number="0" err="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial_timeout of 10s exceeded"" Mar 15 11:03:28 sender-hostname zrepl[21145]: [vms_to_central][repl][kNoL$aK0q$aK0q.cAaS.4gSr.TUvH]: temporary connectivity-related error identified, start waiting for reconnect deadline="2021-03-15 11:13:28.696991772 +0100 CET m=+1265.390770607" attempt_number="0" Mar 15 11:03:38 sender-hostname zrepl[21145]: [vms_to_central][rpc.data][kNoL$aK0q$OpTB$OpTB.32PA]: ping failed err="dial_timeout of 10s exceeded" Mar 15 11:03:38 sender-hostname zrepl[21145]: [vms_to_central][rpc.ctrl][kNoL$aK0q$OpTB$OpTB.32PA]: ping failed err="rpc error: code = Canceled desc = context canceled" Mar 15 11:03:38 sender-hostname zrepl[21145]: [vms_to_central][repl][kNoL$aK0q$aK0q.cAaS.4gSr.TUvH]: reconnecting failed, aborting run err="receiver is not reachable: control and data rpc failed to respond to ping rpcs" attempt_number="0" Mar 15 11:03:41 sender-hostname zrepl[21145]: [vms_to_central][rpc.ctrl][kNoL$aK0q$aK0q.cAaS.4gSr]: cannot connect err="dial_timeout of 10s exceeded"

While ssh [email protected] works fine with key auth.

mdimura avatar Mar 15 '21 10:03 mdimura

A couple of questions:

  • Which zrepl version are you using?
  • Which OS & OpenSSH version are you using?
    • Check specifically whether your OpenSSH's version supports the restrict forced command if you used it (see the tip highlight in the doc page that I linked to)
  • What about the sink? Does it show any logs?
  • The logs indicate that it's a timeout error. That's untypical for config-level errors. Could you check whether the connecting side actually makes a connection attempt, using tcpdump?

problame avatar Mar 22 '21 22:03 problame