instability on Linux machines
Hi, after many tests on different platforms, kernels, networks etc I'm stuck with always the same weird issue i'm going to explain: LAN 1 office: A) Windows 11 (Zerotier-one 1.8.6) B) Debian 11 (Zerotier-one 1.8.6 (builded with debug option))(VM On W11 with bridged ethernet adapter)
LAN 2 home: C) Windows 11 (Zerotier-one 1.8.6) D) Debian 11 (Zerotier-one 1.8.6 (builded with debug option))(VM On W11 with bridged ethernet adapter)
LAN 1 and 2 are two separated networks kilometers apart.
All machine have NetCat (on window 11 I'm using NCat from nmap as suggested here ).
If on A I start NC listening and use C to connect to it (or viceversa) works all perfectly with relativly low latency. if on B I start NC listener and use C or D only sometime I'm able to send a packet and recive it. The same behavior occurs when the listeners are in "LAN 2 home" and use ZTs from office as clients.
It seems that Windows clients works well bidirectionally whilst the Linux client has problem receiving packets
On debian i've build it from the master branch and make it run form command line.
In the output i've found that the debians are continuously requesting configuration for network(about every 10 secs) and in the moment i try to connect with nc cames out MAC failed for packet 97127342bb532374 from 38b02ee846(xxx.xxx.xxx.xxx/9993).
I'm using controller on my.zerotier.com where all zerotiers are authorized and online.
I've alse tried older versions like 1.4.4. ,1.4.6., 1.8.4., 1.8.5.
Thanks for reporting this.
Have you tried with just Linux <-> Linux? In these tests I see Windows as a potential variable to control for.
I've alse tried older versions like 1.4.4. ,1.4.6., 1.8.4., 1.8.5.
So you're saying this problem exists with all of these versions?
I've tried it and it seems to be working but using netcat every 10 packet i send only 3 arrives correctly. I'd like to specify that in every version i tested the behaviour is the same and i've also disable windows firewall.
Can you check that in your output for zerotier-cli peers that you have a DIRECT path between your nodes of interest? This sounds like you might be relaying.
I had a similar problem. I first changed MTU, which helped, but ended up moving away from ZT1 for that project.
I did work fine for about a year, but broke, I think sometime at 2021.
HI @joseph-henry. All the peers including planets and leafs are DIRECT. I've also tried changing MTU(2800,1500,100,68) in both VMs as suggested by @qt1 and nothing changed. There's samething else that I can try to resolve the problem?
could this be related to https://github.com/zerotier/ZeroTierOne/issues/1422?
could this be related to #1422?
Maybe.
Just a shot in the dark, but we've recently fixed a bug that could cause some packets to fail their MAC (validation). This fix is available in dev and I'd be curious if it helps at all.
I have found that version 1.8.x of Zerotier is not stable on my ac86u and ax86u routers, whereas version 1.4.x does not have these issues. I have set the MTU to 1388, as Zerotier 1.4.x on the router requires this to function normally.
I have encountered the following problems:
- Sometimes pings (ICMP) are successful from all peers and both sides, but TCP connections wait for data forever.
zerotier-cli infoshowsONLINE, andlistpeersis all OK. To fix this, I have to restartzerotier-one. - Sometimes
zerotier-cli infogoesOFFLINEand will not come back up unlesszerotier-oneis restarted manually. Thezerotier-one -dprocess shows as active, but the status remainsOFFLINE. This problem has been seen since version 1.6.x, but it has become more frequent and significantly worse with version 1.8.x.
I have also found that versions 1.6.x/1.8.x on Windows 10 have performance issues when accessing LAN IPs. For example, when accessing my Emby server or RDP on my NAS (which has the IP address 192.168.9.7) from my Zerotier network (which has the IP address range 10.9.8.0/24), the connection becomes slower and slower until I have to disconnect Zerotier, at which point the network speed immediately returns to normal.
My network topology is as follows:
WIN10 (10.9.8.8|192.168.9.8) <---> Router ax86u (10.9.8.4|192.168.9.1) <---> NAS_WIN10 (192.168.9.7)
My kernel is
Linux RT-AX86U 4.1.52 #2 SMP PREEMPT Fri Mar 25 11:09:29 EDT 2022 aarch64 ASUSWRT-Merlin.
I have to rely on a one-minute crontab script to prevent Zerotier 1.8.x from failing, which is really annoying. I have not come up with a better idea for detecting TCP connection failures.
# -----------------------------------------
# Example : initCheck
# Argu : none
# Input : None
# Return : None
function initCheck() {
ZT_ONLINE=$(zerotier-cli info| grep -i "online")
if [ -z "$ZT_ONLINE" ];then
sysLOG "ZT OFFLINE! restarting" warning ;
/opt/etc/init.d/S91zerotier-one restart
return 0;
fi
ZT_INTERFACE=$(ip -o link show | grep -oP '\d{1,2}:\s\Kzt[\w]+' | head -n1);
# fallback
if [ -z "$ZT_INTERFACE" ];then
echo "get zt interface empty, try another way";
ZT_INTERFACE=$(ip -o link show | awk -F': ' '{print $2}'|grep "^zt");
fi
# sometimes dev zt0 would disappeared until you restarted zerotier
if [ -z "$ZT_INTERFACE" ];then
sysLOG "zt+ dev disappeared! Restarting" warning ;
/opt/etc/init.d/S91zerotier-one restart
fi
# MTU is causing lots of problems
if [ ! -z "$ZT_INTERFACE" ];then ifconfig "$ZT_INTERFACE" mtu 1388; fi
# add base route tables
if [ ! -z "$ZT_INTERFACE" ];then baseRoute; fi
}
Just a shot in the dark, but we've recently fixed a bug that could cause some packets to fail their MAC (validation). This fix is available in
devand I'd be curious if it helps at all.
doesn't seem to help me
Is anyone in this ticket still having issues? Since the creation of this ticket there have been two MTU-related fixes:
- #1860 (will be in next release) lets you set the ethernet tap's overlay network MTU (default
2800) - #1844 (will be in next release) lets you set the Physical MTU on a per-link basis using multipath. It will likely be generalized to work without multipath but you can try things like:
{
"settings":
{
"defaultBondingPolicy": "ab",
"policies":
{
"ab":
{
"basePolicy": "active-backup",
"failoverInterval": 30000,
"links": {
"eth0": { "mtu": 1400 }
}
}
}
}
}
Hopefully something can be of use. If you have further questions please let me know.