sanoid
sanoid copied to clipboard
implemented option for direct connection via socat and busybox nc
This implements the desired feature to bypass ssh for sending the replication data and use a plain TCP connection. Added warnings of course that this option should not be used lightweight, the parameter option alone should be a big hint to the user :-)
An example use case: Two servers connected via a common network and via a dedicated link.
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444 local_pool root@backup:remote_pool
192.168.32.2 is the network for the direct link and the target host ip address. So all the unencrypted data is transferred via the dedicated link which is trusted.
The option can also be used in the case of natted network topolgies by specifying a differen listen address:
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,10.0.2.4:3333 local_pool root@backup:remote_pool
Why did I use socat and busybox nc? Because it made it possible to make it really easy and clean to implement.
socat supports connection retrying which is needed because the listening socket isn't available immediately. And the busybox netcat implementation is the only one I found which can timeout on an listening socket which is needed to abort if the connection doesn't work (firewall, argument error, ...)
Fixes #371
This is great! Totally saves wasted CPU cycles on two systems connected by a trusted network doing pointless SSH encrypt/decrypt (especially when sending raw encrypted streams).
sorry, but how can i install busybox netcat in a clean way on centos 7 (via rpm command) ?
do we really really need busybox nc and socat ?
couldn't we simply use:
mbuffer -W 10 -I 8888
mbuffer: error: watchdog timeout: input stalled; sending SIGINT
and wrap "mbuffer -O host:port" tries/retries in the syncoid perl-script ?
@phreaker0 does this start the netcat listener on the remote side and close it again as needed, or just expect to find an always-on listener?
@devZer0 Nice, It didn't read/found out that mbuffer timeout also works for non established connection, for all the other tools I tested it didn't (normal netcat, socat, ...). So I guess I can exchange busbox netcat for mbuffer then (will test this latter). But I still need socat for the connection retry options. Doing the retry stuff in perl (as I first planned) would be way more difficult, as I can't use the existing single ssh pipe call for send/recv as syncoid tools, so I would need to rewrite much code of syncoid and it will be more error prone, as I have to start the server on the target and client on the source separately and also need to monitor them somehow, but without threads this will be difficult. And the code would be much harder to maintain and likely wouldn't get merge.
@jimsalterjrs it will start the netcat listener on the remote side as needed and will close it again after the replication or on error
If you're going to reimplement with mbuffer I'll wait to test.
Is there any value to allowing either mbuffer or netcat as transports, keeping some of your existing work for netcat--or should we do mbuffer only, to minimize maintenance complexity down the road?
i had some little conversation with socat author/maintainer and asked for listen timeout feature in socat and convinced him it could be useful. he sent a patch with 1 day :)
as it will need some time that such enhancement will find it's way in major distros, i think there could be 2 ways to proceed:
- use socat on the sending and mbuffer on the receiving side
- use socat on both sides if socat is >=versionnr. with that new feature
@phreaker0 , if you like testing the socat patch i can forward it to you
furthermore, i'm feeling uncomfortable that there is a listener on the receiving side which accepts connection from everywhere for the timeframe of transfer.
if socat is used on the receiving side, there could be easily added some security option to restrict access ( see "RANGE option group" in https://linux.die.net/man/1/socat )
btw, i get the following warning:
Use of uninitialized value $sourcehost in string ne at ./syncoid line 128.
I get a "Use of uninitialized value $sourcehost in string ne at ./syncoid line 128." warning when using
./syncoid --create-bookmark -r --compress=none --insecure-direct-connection host2:4343 big10 root@host2:big8/big10
--insecure-direct-connection
should be able to take just a port and pull the host from the destination argument. Makes LAN backups simpler. Ideally it could automatically pick a free port but that seems to be non-trivial with busybox nc
.
Also could use command checking for socat. If it doesn't exist commands just repeatedly fail.
Otherwise works well for me
@TheLQ warnings are fixed, command checks are in place @devZer0 i'm now using mbuffer for the listening socket instead of busybox nc but's it nice that socat will have a listen timeout in the future as well. The reason why mbuffer didn't work for me at first was the order of arguments, if one uses mbuffer -I 8888 -W 10 it will not timout if there is no connection, only if -W is before the -I flag. This is documented in the manpage.
@jimsalterjrs I don't see a point in supporting busybox nc as well if mbuffer can do the job as well, you can test now
mhm, it doesn't work on my servers (i only tested with local addresses on my machine), need to investigate
So, mbuffer behaves much differently than the other listening tools. The address provided to mbuffer isn't used as listening address but as src address whitelist. mbuffer will listen on all network interfaces.
Therefore I switched back to busybox nc as default and added an option for switching to mbuffer (in which case the specified listen address is used as an IP filter).
I also increased the default timeout to 60 seconds and made it configurable, for some of my datasets with tiny files on rust and lot's of metadata changes zfs send can be so slow that a timeout is triggered.
Command check for busybox nc and mbuffer is done according to the provided options.
examples:
busybox nc, target and listen ip is the same (no NAT)
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444 local_pool root@backup:remote_pool
busybox nc, target and listen ip is different -> NAT
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,10.0.2.4:3333 local_pool root@backup:remote_pool
busybox nc, target and listen ip is the same (no NAT) and timeout of 120 seconds
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,192.168.32.2:4444,120 local_pool root@backup:remote_pool
mbuffer tcp (192.168.32.1 is src address), target and listen ip is the same and timeout of 120 seconds
syncoid --compress=none --insecure-direct-connection=192.168.32.2:4444,192.168.32.1:4444,120,mbuffer local_pool root@backup:remote_pool
Would love to see this for LAN syncs ...
This looks awesome, looking forward to seeing it merged
just wanted to comment that I have tested the insecure connection and it 'works for me' using 10Gb SFP+ . I was limited to about 200MB per second network transfer with ssh cipher 'aes128-ctr' . only about 150MB with the default ssh cipher. using the insecure connection was able to sustain almost 400MB per second. The data was being transferred between ssd's on both ends.
BTW I just copied the syncoid from https://github.com/phreaker0/sanoid/blob/direct-connection/syncoid and dropped it into /usr/local/bin
and here is the command line tested
/usr/local/bin/syncoid --recursive --no-sync-snap --compress=none --insecure-direct-connection=192.168.1.30:4444,192.168.1.20:4444,120,mbuffer server_ssd root@serverx:serverx_ssd/backup
I'm using this and it works fine for me. Merged with latest master and fixed the minor conflict from a new option being added, still no problems.
regarding https://github.com/jimsalterjrs/sanoid/pull/513#issuecomment-596450652 , socat since version 1.7.4.0 now supports option "accept-timeout" , which make it also suitable for sanoid/syncoid
http://www.dest-unreach.org/socat/doc/socat.html#OPTION_ACCEPT_TIMEOUT
accept-timeout=
http://www.dest-unreach.org/socat/doc/CHANGES
New option accept-timeout (listen-timeout)
Test: ACCEPTTIMEOUT
Proposed by Roland
What else is needed for this to be included in the next release? Or, at least, merged to master?
does this only work local to remote? or am I using it wrong?
I'm trying remote to local and fails like this (the resume interrupted is because I started it via ssh then CTRL-C'd it, but fails the same on clean send).
BTW here is -w
in action from nmap-ncat
on OL8, @phreaker0, as a continuation of the previous discussion that I started in the wrong place. I modified $directtimeout to 10 from 60 in syncoid and you can see nmap-ncat -w
times out in listening mode after 10 seconds:
[remoteuser@dell810 sanoid]$ ./syncoid --debug --insecure-direct-connection=192.168.70.11:12345 --no-sync-snap --sendoptions="-L" --recvoptions="-vu" --compress=none --source-bwlimit=900m --target-bwlimit=900m remoteuser@olvm2:olvm2/vm_uri hddolvm/vm_uri | sed "s/^/$(date '+[%Y-%m-%d %H:%M:%S]') /"
[2023-06-26 17:07:32] DEBUG: SSHCMD: ssh
[2023-06-26 17:07:32] DEBUG: compression forced off from command line arguments.
[2023-06-26 17:07:32] DEBUG: checking availability of socat on source...
[2023-06-26 17:07:32] DEBUG: checking availability of busybox (for nc) on target...
[2023-06-26 17:07:32] DEBUG: checking availability of mbuffer on source...
[2023-06-26 17:07:32] DEBUG: checking availability of mbuffer on target...
[2023-06-26 17:07:32] DEBUG: checking availability of pv on local machine...
[2023-06-26 17:07:32] DEBUG: checking availability of zfs resume feature on source...
[2023-06-26 17:07:32] DEBUG: checking availability of zfs resume feature on target...
[2023-06-26 17:07:32] DEBUG: syncing source olvm2/vm_uri to target hddolvm/vm_uri.
[2023-06-26 17:07:32] DEBUG: getting current value of syncoid:sync on olvm2/vm_uri...
[2023-06-26 17:07:32] ssh -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 sudo zfs get -H syncoid:sync ''"'"'olvm2/vm_uri'"'"''
[2023-06-26 17:07:32] DEBUG: checking to see if hddolvm/vm_uri on is already in zfs receive using ps -Ao args= ...
[2023-06-26 17:07:32] DEBUG: checking to see if target filesystem exists using " sudo zfs get -H name 'hddolvm/vm_uri' 2>&1 |"...
[2023-06-26 17:07:32] DEBUG: getting current value of receive_resume_token on hddolvm/vm_uri...
[2023-06-26 17:07:32] sudo zfs get -H receive_resume_token 'hddolvm/vm_uri'
[2023-06-26 17:07:32] DEBUG: got receive resume token: 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e:
[2023-06-26 17:07:32] DEBUG: getting estimated transfer size from source -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 using "ssh -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 sudo zfs send -nvP -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e 2>&1 |"...
[2023-06-26 17:07:32] DEBUG: sendsize = 24138379648
[2023-06-26 17:07:32] Resuming interrupted zfs send/receive from olvm2/vm_uri to hddolvm/vm_uri (~ 22.5 GB remaining):
[2023-06-26 17:07:32] DEBUG: ssh -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 'sudo zfs send -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e | mbuffer -R 900m -q -s 128k -m 16M | socat - TCP:192.168.70.11:12345,retry=10,interval=1' | nc -l 192.168.70.11:12345 -w 10 | mbuffer -r 900m -q -s 128k -m 16M | pv -p -t -e -r -b -s 24138379648 | sudo zfs receive -v -u -s -F 'hddolvm/vm_uri' 2>&1
Ncat: Could not resolve hostname "192.168.70.11:12345": Name or service not known. QUITTING.
0.00 B 0:00:00 [0.00 B/s] [> ] 0%
[2023-06-26 17:07:32] cannot receive: failed to read from stream
2023/06/26 17:07:43 socat[28059] E connect(5, AF=2 192.168.70.11:12345, 16): Connection refused
mbuffer: error: outputThread: error writing to <stdout> at offset 0x10000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
CRITICAL ERROR: ssh -S /tmp/syncoid-remoteuser@olvm2-1687788452-9889 remoteuser@olvm2 'sudo zfs send -t 1-f7df1bf1c-e0-789c636064000310a500c4ec50360710e72765a52697303048409460caa7a515a796806426f0c3e4d990e4932a4b528b81b4c3a367ecd8f497e4a79766a630303ce479dc3bb566b9a401923c27583e2f313715684f4e59ae917e596e7c6951a643724e6a621ec23dbc0c08f7e72416a5a726e5e42767e767438519004f7f203e | mbuffer -R 900m -q -s 128k -m 16M | socat - TCP:192.168.70.11:12345,retry=10,interval=1' | nc -l 192.168.70.11:12345 -w 10 | mbuffer -r 900m -q -s 128k -m 16M | pv -p -t -e -r -b -s 24138379648 | sudo zfs receive -v -u -s -F 'hddolvm/vm_uri' 2>&1 failed: 256 at ./syncoid line 629.
[remoteuser@dell810 sanoid]$ grep directtimeout syncoid
my $directtimeout = 10;
@mailinglists35 checking your output ncat is exiting immediately: 'Ncat: Could not resolve hostname "192.168.70.11:12345": Name or service not known. QUITTING.'
and socat retries 10 times with 1 second intervals and gives up.
but I am able to nc from remote to local (local = 192.168.70.11:12345 )
oh so the $directtimeout is for socat, not for nc? it seems to be used both by nc and socat, though.
oh, sorry, nmap-ncat does not like -l IP:PORT :)
@phreaker0 I see you have a $directmbuffer hardcoded, is it usable if I switch to 1, and how? will that bypass nc?
ok, I modified my local copy of syncoid to understand nmap-ncat, since there is no busybox in EL9 repos...