sanoid icon indicating copy to clipboard operation
sanoid copied to clipboard

implemented option for direct connection via socat and busybox nc

Open phreaker0 opened this issue 5 years ago • 18 comments

This implements the desired feature to bypass ssh for sending the replication data and use a plain TCP connection. Added warnings of course that this option should not be used lightweight, the parameter option alone should be a big hint to the user :-)

An example use case: Two servers connected via a common network and via a dedicated link. syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444 local_pool root@backup:remote_pool

192.168.32.2 is the network for the direct link and the target host ip address. So all the unencrypted data is transferred via the dedicated link which is trusted.

The option can also be used in the case of natted network topolgies by specifying a differen listen address:

syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,10.0.2.4:3333 local_pool root@backup:remote_pool

Why did I use socat and busybox nc? Because it made it possible to make it really easy and clean to implement.

socat supports connection retrying which is needed because the listening socket isn't available immediately. And the busybox netcat implementation is the only one I found which can timeout on an listening socket which is needed to abort if the connection doesn't work (firewall, argument error, ...)

Fixes #371

phreaker0 avatar Feb 21 '20 19:02 phreaker0

This is great! Totally saves wasted CPU cycles on two systems connected by a trusted network doing pointless SSH encrypt/decrypt (especially when sending raw encrypted streams).

secabeen avatar Feb 21 '20 22:02 secabeen

sorry, but how can i install busybox netcat in a clean way on centos 7 (via rpm command) ?

devZer0 avatar Mar 07 '20 16:03 devZer0

do we really really need busybox nc and socat ?

couldn't we simply use:

mbuffer -W 10 -I 8888

mbuffer: error: watchdog timeout: input stalled; sending SIGINT

and wrap "mbuffer -O host:port" tries/retries in the syncoid perl-script ?

devZer0 avatar Mar 07 '20 16:03 devZer0

@phreaker0 does this start the netcat listener on the remote side and close it again as needed, or just expect to find an always-on listener?

jimsalterjrs avatar Mar 07 '20 22:03 jimsalterjrs

@devZer0 Nice, It didn't read/found out that mbuffer timeout also works for non established connection, for all the other tools I tested it didn't (normal netcat, socat, ...). So I guess I can exchange busbox netcat for mbuffer then (will test this latter). But I still need socat for the connection retry options. Doing the retry stuff in perl (as I first planned) would be way more difficult, as I can't use the existing single ssh pipe call for send/recv as syncoid tools, so I would need to rewrite much code of syncoid and it will be more error prone, as I have to start the server on the target and client on the source separately and also need to monitor them somehow, but without threads this will be difficult. And the code would be much harder to maintain and likely wouldn't get merge.

@jimsalterjrs it will start the netcat listener on the remote side as needed and will close it again after the replication or on error

phreaker0 avatar Mar 08 '20 09:03 phreaker0

If you're going to reimplement with mbuffer I'll wait to test.

Is there any value to allowing either mbuffer or netcat as transports, keeping some of your existing work for netcat--or should we do mbuffer only, to minimize maintenance complexity down the road?

jimsalterjrs avatar Mar 08 '20 15:03 jimsalterjrs

i had some little conversation with socat author/maintainer and asked for listen timeout feature in socat and convinced him it could be useful. he sent a patch with 1 day :)

as it will need some time that such enhancement will find it's way in major distros, i think there could be 2 ways to proceed:

  1. use socat on the sending and mbuffer on the receiving side
  2. use socat on both sides if socat is >=versionnr. with that new feature

@phreaker0 , if you like testing the socat patch i can forward it to you

furthermore, i'm feeling uncomfortable that there is a listener on the receiving side which accepts connection from everywhere for the timeframe of transfer.

if socat is used on the receiving side, there could be easily added some security option to restrict access ( see "RANGE option group" in https://linux.die.net/man/1/socat )

devZer0 avatar Mar 09 '20 10:03 devZer0

btw, i get the following warning:

Use of uninitialized value $sourcehost in string ne at ./syncoid line 128.

devZer0 avatar Mar 10 '20 20:03 devZer0

I get a "Use of uninitialized value $sourcehost in string ne at ./syncoid line 128." warning when using

./syncoid --create-bookmark -r --compress=none --insecure-direct-connection host2:4343 big10 root@host2:big8/big10

--insecure-direct-connection should be able to take just a port and pull the host from the destination argument. Makes LAN backups simpler. Ideally it could automatically pick a free port but that seems to be non-trivial with busybox nc.

Also could use command checking for socat. If it doesn't exist commands just repeatedly fail.

Otherwise works well for me

TheLQ avatar Mar 25 '20 15:03 TheLQ

@TheLQ warnings are fixed, command checks are in place @devZer0 i'm now using mbuffer for the listening socket instead of busybox nc but's it nice that socat will have a listen timeout in the future as well. The reason why mbuffer didn't work for me at first was the order of arguments, if one uses mbuffer -I 8888 -W 10 it will not timout if there is no connection, only if -W is before the -I flag. This is documented in the manpage.

@jimsalterjrs I don't see a point in supporting busybox nc as well if mbuffer can do the job as well, you can test now

phreaker0 avatar Mar 31 '20 07:03 phreaker0

mhm, it doesn't work on my servers (i only tested with local addresses on my machine), need to investigate

phreaker0 avatar Mar 31 '20 07:03 phreaker0

So, mbuffer behaves much differently than the other listening tools. The address provided to mbuffer isn't used as listening address but as src address whitelist. mbuffer will listen on all network interfaces.

Therefore I switched back to busybox nc as default and added an option for switching to mbuffer (in which case the specified listen address is used as an IP filter).

I also increased the default timeout to 60 seconds and made it configurable, for some of my datasets with tiny files on rust and lot's of metadata changes zfs send can be so slow that a timeout is triggered.

Command check for busybox nc and mbuffer is done according to the provided options.

examples:

busybox nc, target and listen ip is the same (no NAT) syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444 local_pool root@backup:remote_pool

busybox nc, target and listen ip is different -> NAT syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,10.0.2.4:3333 local_pool root@backup:remote_pool

busybox nc, target and listen ip is the same (no NAT) and timeout of 120 seconds syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,192.168.32.2:4444,120 local_pool root@backup:remote_pool

mbuffer tcp (192.168.32.1 is src address), target and listen ip is the same and timeout of 120 seconds syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,192.168.32.1:4444,120,mbuffer local_pool root@backup:remote_pool

phreaker0 avatar Apr 17 '20 07:04 phreaker0

Would love to see this for LAN syncs ...

asche77 avatar May 30 '20 21:05 asche77

This looks awesome, looking forward to seeing it merged

geudrik avatar Jul 16 '20 19:07 geudrik

just wanted to comment that I have tested the insecure connection and it 'works for me' using 10Gb SFP+ . I was limited to about 200MB per second network transfer with ssh cipher 'aes128-ctr' . only about 150MB with the default ssh cipher. using the insecure connection was able to sustain almost 400MB per second. The data was being transferred between ssd's on both ends.

BTW I just copied the syncoid from https://github.com/phreaker0/sanoid/blob/direct-connection/syncoid and dropped it into /usr/local/bin

and here is the command line tested

/usr/local/bin/syncoid  --recursive --no-sync-snap --compress=none --insecure-direct-connection=192.168.1.30:4444,192.168.1.20:4444,120,mbuffer server_ssd root@serverx:serverx_ssd/backup

jim-perkins avatar Aug 10 '20 20:08 jim-perkins

I'm using this and it works fine for me. Merged with latest master and fixed the minor conflict from a new option being added, still no problems.

TheLQ avatar Mar 12 '21 01:03 TheLQ

regarding https://github.com/jimsalterjrs/sanoid/pull/513#issuecomment-596450652 , socat since version 1.7.4.0 now supports option "accept-timeout" , which make it also suitable for sanoid/syncoid

http://www.dest-unreach.org/socat/doc/socat.html#OPTION_ACCEPT_TIMEOUT

accept-timeout= End waiting for a connection after [timeval] with error status.

http://www.dest-unreach.org/socat/doc/CHANGES

New option accept-timeout (listen-timeout)
Test: ACCEPTTIMEOUT
Proposed by Roland

devZer0 avatar Mar 23 '21 12:03 devZer0

What else is needed for this to be included in the next release? Or, at least, merged to master?

tinsami1 avatar Mar 04 '22 08:03 tinsami1

does this only work local to remote? or am I using it wrong?

I'm trying remote to local and fails like this (the resume interrupted is because I started it via ssh then CTRL-C'd it, but fails the same on clean send).

BTW here is -w in action from nmap-ncat on OL8, @phreaker0, as a continuation of the previous discussion that I started in the wrong place. I modified $directtimeout to 10 from 60 in syncoid and you can see nmap-ncat -w times out in listening mode after 10 seconds:

[remoteuser@dell810 sanoid]$ ./syncoid --debug --insecure-direct-connection=192.168.70.11:12345 --no-sync-snap --sendoptions="-L" --recvoptions="-vu" --compress=none --source-bwlimit=900m --target-bwlimit=900m remoteuser@olvm2:olvm2/vm_uri hddolvm/vm_uri | sed "s/^/$(date '+[%Y-%m-%d %H:%M:%S]') /"
[2023-06-26 17:07:32] DEBUG: SSHCMD: ssh
[2023-06-26 17:07:32] DEBUG: compression forced off from command line arguments.
[2023-06-26 17:07:32] DEBUG: checking availability of socat on source...
[2023-06-26 17:07:32] DEBUG: checking availability of busybox (for nc) on target...
[2023-06-26 17:07:32] DEBUG: checking availability of mbuffer on source...
[2023-06-26 17:07:32] DEBUG: checking availability of mbuffer on target...
[2023-06-26 17:07:32] DEBUG: checking availability of pv on local machine...
[2023-06-26 17:07:32] DEBUG: checking availability of zfs resume feature on source...
[2023-06-26 17:07:32] DEBUG: checking availability of zfs resume feature on target...
[2023-06-26 17:07:32] DEBUG: syncing source olvm2/vm_uri to target hddolvm/vm_uri.
[2023-06-26 17:07:32] DEBUG: getting current value of syncoid:sync on olvm2/vm_uri...
[2023-06-26 17:07:32] ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 sudo zfs get -H syncoid:sync ''"'"'olvm2/vm_uri'"'"''
[2023-06-26 17:07:32] DEBUG: checking to see if hddolvm/vm_uri on  is already in zfs receive using  ps -Ao args= ...
[2023-06-26 17:07:32] DEBUG: checking to see if target filesystem exists using " sudo zfs get -H name 'hddolvm/vm_uri' 2>&1 |"...
[2023-06-26 17:07:32] DEBUG: getting current value of receive_resume_token on hddolvm/vm_uri...
[2023-06-26 17:07:32]  sudo zfs get -H receive_resume_token 'hddolvm/vm_uri'
[2023-06-26 17:07:32] DEBUG: got receive resume token: 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e:
[2023-06-26 17:07:32] DEBUG: getting estimated transfer size from source -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 using "ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 sudo zfs send  -nvP -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e 2>&1 |"...
[2023-06-26 17:07:32] DEBUG: sendsize = 24138379648
[2023-06-26 17:07:32] Resuming interrupted zfs send/receive from olvm2/vm_uri to hddolvm/vm_uri (~ 22.5 GB remaining):
[2023-06-26 17:07:32] DEBUG: ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 'sudo zfs send  -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e | mbuffer -R 900m -q -s 128k -m 16M | socat - TCP:192.168.70.11:12345,retry=10,interval=1' |  nc -l 192.168.70.11:12345 -w 10 | mbuffer -r 900m -q -s 128k -m 16M | pv -p -t -e -r -b -s 24138379648 | sudo zfs receive -v -u  -s -F 'hddolvm/vm_uri' 2>&1
Ncat: Could not resolve hostname "192.168.70.11:12345": Name or service not known. QUITTING.
0.00 B 0:00:00 [0.00 B/s] [>                                                                                                     ]  0%
[2023-06-26 17:07:32] cannot receive: failed to read from stream
2023/06/26 17:07:43 socat[28059] E connect(5, AF=2 192.168.70.11:12345, 16): Connection refused
mbuffer: error: outputThread: error writing to <stdout> at offset 0x10000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
CRITICAL ERROR: ssh      -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 'sudo zfs send  -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e | mbuffer -R 900m -q -s 128k -m 16M | socat - TCP:192.168.70.11:12345,retry=10,interval=1' |  nc -l 192.168.70.11:12345 -w 10 | mbuffer -r 900m -q -s 128k -m 16M | pv -p -t -e -r -b -s 24138379648 | sudo zfs receive -v -u  -s -F 'hddolvm/vm_uri' 2>&1 failed: 256 at ./syncoid line 629.
[remoteuser@dell810 sanoid]$ grep directtimeout syncoid
my $directtimeout = 10;

mailinglists35 avatar Jun 26 '23 14:06 mailinglists35

@mailinglists35 checking your output ncat is exiting immediately: 'Ncat: Could not resolve hostname "192.168.70.11:12345": Name or service not known. QUITTING.'

and socat retries 10 times with 1 second intervals and gives up.

phreaker0 avatar Jun 26 '23 14:06 phreaker0

but I am able to nc from remote to local (local = 192.168.70.11:12345 )

mailinglists35 avatar Jun 26 '23 14:06 mailinglists35

oh so the $directtimeout is for socat, not for nc? it seems to be used both by nc and socat, though.

mailinglists35 avatar Jun 26 '23 14:06 mailinglists35

oh, sorry, nmap-ncat does not like -l IP:PORT :)

mailinglists35 avatar Jun 26 '23 14:06 mailinglists35

@phreaker0 I see you have a $directmbuffer hardcoded, is it usable if I switch to 1, and how? will that bypass nc?

mailinglists35 avatar Jun 26 '23 14:06 mailinglists35

ok, I modified my local copy of syncoid to understand nmap-ncat, since there is no busybox in EL9 repos...

mailinglists35 avatar Jun 26 '23 14:06 mailinglists35