lime-packages icon indicating copy to clipboard operation
lime-packages copied to clipboard

MESH-SAE-AUTH-FAILURE

Open rallep71 opened this issue 4 years ago • 60 comments

I have three routers in my Lime mesh network, TP WDR 4300, TP Archer C50 V3 and V4.

All built the firmware according to the instructions. What is the error?

Thu Dec 24 11:54:04 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-FAILURE addr=b0:4e:26:45:63:ac Thu Dec 24 11:54:23 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-FAILURE addr=b0:4e:26:45:63:ac Thu Dec 24 11:54:38 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-FAILURE addr=b0:4e:26:45:63:ac Thu Dec 24 11:54:38 2020 daemon.notice wpa_supplicant[2606]: wlan0-mesh: MESH-SAE-AUTH-BLOCKED addr=b0:4e:26:45:63:ac duration=300

rallep71 avatar Dec 24 '20 11:12 rallep71

Can you post more details? Like the /etc/config/lime-autogen content of all the three routers or the output of lime-report command from all the routers?

ilario avatar Dec 25 '20 21:12 ilario

It'd also be very interesting which version of OpenWrt and which build variant of wpad-mesh you are using. wpad-mesh-wolfssl only works well on OpenWrt 19.07.5 and later (due to a performance bug in hostapd's usage of WolfSSL API which leads to SEA failing due to timeout). Ie. what you are experiencing is symptomatically what we saw before https://github.com/openwrt/openwrt/commit/d8d1956a8087da2fd4465c4381d9e28b91cdc1e8

dangowrt avatar Dec 25 '20 22:12 dangowrt

Hello everyone, I am posting the three reports https://drive.google.com/file/d/1fpvugPnB9jzcZVs5RSDhHqL-pOd0Eczq/view?usp=sharing https://drive.google.com/file/d/1KF2GJId3gZeFXmjwmJHUO25SO9AJTM__/view?usp=sharing https://drive.google.com/file/d/1lhEaMB6CUhxbbpzXMS17HvTtcWBTps2G/view?usp=sharing

there is also no wpad-mesh-wolfssl installed, there is wpad-mesh-openssl installed. Because when I select the profiles lime default and lime encrypt, wpad-mesh-openssl is automatically selected and I cannot change it to wpad-mesh-wolfssl.

rallep71 avatar Dec 26 '20 07:12 rallep71

@dangowrt hostapd is not installd, it is installd hostapd-common 2019-08-08-ca8c2bd2-4 https://libremesh.org/development.html firmware build from git clone -b v19.07.5 --single-branch https://git.openwrt.org/openwrt/openwrt.git @ilario i have network porfile and and make modified -wpad-mesh-openssl to wpad-mesh-wolfssl, now im compiling new images for the three routers and will see what happend ;)

So, new images are ready and installed, here are the reports of the three routers, MESH-SAE-AUTH-FAILURE is still there. https://drive.google.com/file/d/17AANtEg7LnNxE_VsBgeBitwiqJ6XLTj0/view?usp=sharing https://drive.google.com/file/d/1YJj-oPIJQqVUJ4mWPQhJXVW3yCfFFJz-/view?usp=sharing https://drive.google.com/file/d/1O8zk_V7bgqijrgbuff-1ODYmknU8oc4r/view?usp=sharing

rallep71 avatar Dec 26 '20 16:12 rallep71

I have only had two routers in operation for about 20 hours now, TP Archer C50 v3 and v4, the WDR4300 router is switched off. I no longer have any messages (MESH-SAE-AUTH-FAILURE) in the C50 v3 and v4. very confusing.....

And now Start wdr4300 and log in rootnode Tue Dec 29 16:15:35 2020 daemon.notice wpa_supplicant[2492]: wlan1-mesh: new peer notification for 64:70:02:a2:fd:24 Tue Dec 29 16:15:53 2020 daemon.notice wpa_supplicant[2492]: nl80211: nl80211_recv_beacons->nl_recvmsgs failed: -5 Tue Dec 29 16:15:55 2020 daemon.notice wpa_supplicant[2492]: wlan1-mesh: MESH-SAE-AUTH-FAILURE addr=64:70:02:a2:fd:24

rallep71 avatar Dec 29 '20 14:12 rallep71

there is also no wpad-mesh-wolfssl installed, there is wpad-mesh-openssl installed. Because when I select the profiles lime default and lime encrypt, wpad-mesh-openssl is automatically selected and I cannot change it to wpad-mesh-wolfssl.

Can you confirm that now you have wpad-mesh-wolfssl on the three routers? Try not selecting any network-profiles, as in your case they should not be needed (usually are for communities willing to simplify the configuration process and selection of packages).

Are you sure that using SAE is a good idea? I have not a clue on this, but if it creates problems we cannot solve (I surely cannot, maybe @dangowrt or @aparcar?) you could stick to psk2/aes, as suggested in the lime-example.

ilario avatar Jan 02 '21 16:01 ilario

Hello Ilario yes, I have wpad-mesh-wolfssl on all three routers.I have adjusted the porfile.index and profile.mk locally so that wpad-mesh-wolfssl is automatically selected. I thought not too long ago we wrote about wolfssl being smaller in resource consumption than openssl. I don't know if SAE is a good idea, but I've seen it in the profiles of Freifunk, who also use it. I will now test everything again with the Lime sample, i.e. openssl, psk2/aes. Let's see if I still get the error.

rallep71 avatar Jan 03 '21 06:01 rallep71

Hello Ilario I will now test everything again with the Lime sample, i.e. openssl, psk2/aes. Let's see if I still get the error.

Please try simply keeping the image you have with wolfssl and just editing the /etc/config/lime-node to indicate psk2/aes

ilario avatar Jan 03 '21 09:01 ilario

Hello Ilario, I have now tested this again with psk2 aes and wolfssl, with wdr4300 the error comes back....Good, the three nodes communicate with each other, I can move with the smartphone in the nodes without crashes, switching between the nodes is fast.

I will change the wdr4300 router again and replace it with an archer c50v3, let's see what happens then

rallep71 avatar Jan 05 '21 16:01 rallep71

Hello, problem solved. mesh compiled with psk2 + aes +openssl openwrt 19.07.6

rallep71 avatar Jan 26 '21 18:01 rallep71

I'm currently experiencing exactly the same MESH_AUTH_FAILURE and then MESH_AUTH_BLOCKED using wpad-mesh-wolfssl on OpenWrt 21.02.0-rc.1. But it only seems to be an issue if one mesh partner reboots (we have a maintenance reboot in the night once a week) and then the MESH_AUTH_FAILURE come up while it worked perfectly before that reboot event.

Catfriend1 avatar May 11 '21 09:05 Catfriend1

@rallep71 did you finally use WolfSSL or OpenSSL? @Catfriend1 are you using also LibreMesh or directly OpenWrt? Can you try with the openssl version of wpad? Thanks!

ilario avatar May 11 '21 13:05 ilario

@ilario Using Openwrt directly here. I can test it, will take time until I get to observe a week.

Catfriend1 avatar May 11 '21 13:05 Catfriend1

Anyway I'm not sure that in the LibreMesh community there are many people actually encrypting 802.11s. Did you ask also in OpenWrt forums?

ilario avatar May 11 '21 13:05 ilario

In the Element chatroom (see here for the direction) @egon0 mentioned that the solution was to switch to OpenSSL.

ilario avatar May 16 '21 17:05 ilario

Generally this seems to be performance/timing related. The bug in hostapd/wpa_supplicant/wpad which caused those symptoms when using WolfSSL previously was to use a too costly function to generate random numbers (generating random prime numbers instead of just arbitrary random numbers). Once this had been fixed, things cleared up here and from what I can tell, the bug is gone now, running OpenWrt 19.07.6 seems fairly stable with wpad-mesh-wolfssl.

root@stannebeinplatz-m5:~# opkg list-installed | grep wolf
libwolfssl24 - 4.6.0-stable-1
wpad-mesh-wolfssl - 2019-08-08-ca8c2bd2-4

root@rdntz-stannebeinplatz:~# uptime
 06:21:41 up 95 days, 15:02,  load average: 0.58, 0.49, 0.46

root@rdntz-stannebeinplatz:~# ifconfig
[...]
wlan0-mesh_13 Link encap:Ethernet  HWaddr F0:9F:C2:8C:81:7A  
          inet addr:169.254.129.122  Bcast:255.255.255.255  Mask:255.255.255.255
          inet6 addr: fd70:6bf5:5eab:b59d:40ed:cc17:5c70:1dd3/16 Scope:Global
          inet6 addr: fe80::f29f:c2ff:fe8c:817a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1206901449 errors:0 dropped:0 overruns:0 frame:0
          TX packets:681661127 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1552561276067 (1.4 TiB)  TX bytes:123728430764 (115.2 GiB)

root@rdntz-stannebeinplatz:~# iw dev wlan0-mesh station dump
Station 80:2a:a8:bc:16:bb (on wlan0-mesh)
	inactive time:	0 ms
	rx bytes:	184859508823
	rx packets:	161009141
	tx bytes:	25040698030
	tx packets:	79598525
	tx retries:	6421311
	tx failed:	133
	rx drop misc:	138975
	signal:  	-65 [-73, -68] dBm
	signal avg:	-64 [-72, -66] dBm
	Toffset:	18446739638920988636 us
	tx bitrate:	162.0 MBit/s MCS 12 40MHz
	rx bitrate:	120.0 MBit/s MCS 11 40MHz short GI
	rx duration:	0 us
	last ack signal:35 dBm
	expected throughput:	49.163Mbps
	mesh llid:	0
	mesh plid:	0
	mesh plink:	ESTAB
	mesh local PS mode:	ACTIVE
	mesh peer PS mode:	ACTIVE
	mesh non-peer PS mode:	ACTIVE
	authorized:	yes
	authenticated:	yes
	associated:	yes
	preamble:	long
	WMM/WME:	yes
	MFP:		yes
	TDLS peer:	no
	DTIM period:	2
	beacon interval:100
	connected time:	928198 seconds

In order to address what hence seems to be a regression, it'd be great to know whether

  • this only occurs with 21.02-rc1 (but not with 19.07.6) also in your setup
  • which hardware you are using (for example, Ubiquiti XM boards have been phased out due too low RAM in our network, by now everything is at least MIPS74Kc with 64MiB of RAM)

dangowrt avatar May 17 '21 06:05 dangowrt

@dangowrt Using TPlink archer c7v2 and v5 with ath10k non ct drivers here. it only occurs in 21.02rc1 after running days? a week? (immediate reboots , wifi restart cannot reproduce the problem ; weekly reboot does 100% reproduce)

iw dev wlan0 (my mesh interface) shows MTU 1532 (due to batman-adv usage and configuration).

iw dev wlan0 station dump results in

failed to parse nested attributes

Catfriend1 avatar May 17 '21 08:05 Catfriend1

thx @dangowrt for these infos and clarification.

i will have a look into this, using vanilla openwrt on wdr3600 and 2x archer c7 v5. i will try to upgrade to 19.07.6 and give it a try.

egon0 avatar May 18 '21 10:05 egon0

I have experienced the same issue with 21.02-rc1, but using wpad-mesh-openssl, not wolfssl. In the log first I get a new peer notification, then five times MESH-SAE-AUTH-FAILURE, followed by MESH-SAE-AUTH-BLOCKED for 300 seconds.

Hardware is TP-Link WDR4300, running 802.11s encrypted mesh with BATMAN.

goligo avatar Jun 02 '21 19:06 goligo

I'm currently trying Openwrt 21.02.0-rc2 which has another getrandom package version shipped if that matters.

Catfriend1 avatar Jun 02 '21 20:06 Catfriend1

Archer C2 v1 OpenWrt 19.07-SNAPSHOT r11328-81266d9001 The same issue. MESH-SAE-AUTH-FAILURE followed by MESH-SAE-AUTH-BLOCKED for 300 seconds.

Without encryption mesh works properly.

Archer C7 v5, OpenWrt 19.07.7, r11306-c4a6851c72, encrypted mesh works properly on 5GHz and on 2.4GHz.

I don't know. Reading above I think that C2 and WDR4300 are not fast enough to generate messages for handshake?

Archer C50 + Archer C2 also MESH-SAE-AUTH-FAILURE followed by MESH-SAE-AUTH-BLOCKED.

mickeyreg avatar Jun 03 '21 09:06 mickeyreg

@mickeyreg For me, everything was ok on Archer C7v2|5 until I upgraded beyond 19.07.7 (snapshot, 21.02 rcX)

Catfriend1 avatar Jun 03 '21 09:06 Catfriend1

I'm new in mesh configuration. The first try was on Archer C7 v5. I could not get it working on 21.02, so I make downgrade to 19.07 and succesfuly configured everything. Wireless did not work at all on my C7 with 21.02 :( As I can read above I have a little older SNAPSHOT on C7 than on C2. I'll try to upgrade C7 next week.

mickeyreg avatar Jun 03 '21 20:06 mickeyreg

@mickeyreg please see https://forum.openwrt.org/t/state-of-tp-link-archer-c7v2-v5-in-2021/95787 My mesh works on 21.02 but sometimes gets this auth failures.

Catfriend1 avatar Jun 03 '21 20:06 Catfriend1

Hello, problem solved. mesh compiled with psk2 + aes +openssl openwrt 19.07.6

Hi all, I tried to setup with psk2+aes but either, I am doing something wrong or it is not allowing me to setup psk2+aes as encryption for mesh.

Could u pls share your /config/ file where u setup this?

djStolen avatar Jun 06 '21 11:06 djStolen

Hi guys,

does the Hardware have to set

sta->sae->state = SAE_ACCEPTED

because I cannot find it anywhere in the code?

If that's not the case, and I am not wrong with my search, it's to expect it fails every time on check:

if (sta->sae->state != SAE_ACCEPTED)

in void mesh_auth_timer(void *eloop_ctx, void *user_data)

djStolen avatar Jun 07 '21 14:06 djStolen

@djStolen I don't know the implementation but your finding sounds reasonable.

Catfriend1 avatar Jun 07 '21 16:06 Catfriend1

I have tested C2 with 19.07.7, also 19.07.6 and 19.07.5 does not work with SAE. But all versions have now newer versions of wolfssl library, than mentioned above.

I have tested also C7 v5 with 19.07 SNAPSHOT - works with authentication without problems.

mickeyreg avatar Jun 07 '21 17:06 mickeyreg

C2 is slower than C7, but the setting is: #define MESH_AUTH_TIMEOUT 10 <- it is 10 seconds, so too long for real timeout...

It tooks on C7 less than 1 second:

Mon Jun  7 12:17:27 2021 daemon.notice wpa_supplicant[1936]: wlan0: new peer notification for 50:d4:f7:15:15:29
Mon Jun  7 12:17:27 2021 daemon.notice wpa_supplicant[1936]: wlan0: mesh plink with 50:d4:f7:15:15:29 established
Mon Jun  7 12:17:27 2021 daemon.notice wpa_supplicant[1936]: wlan0: MESH-PEER-CONNECTED 50:d4:f7:15:15:29

I found the problem also in C7 logs:

Wed Jun  2 09:27:06 2021 daemon.notice wpa_supplicant[1936]: wlan0: new peer notification for 50:d4:f7:15:1f:0a
Wed Jun  2 09:27:16 2021 daemon.notice wpa_supplicant[1936]: wlan0: MESH-SAE-AUTH-FAILURE addr=50:d4:f7:15:1f:0a
Wed Jun  2 09:27:31 2021 daemon.notice wpa_supplicant[1936]: wlan0: MESH-SAE-AUTH-FAILURE addr=50:d4:f7:15:1f:0a
Wed Jun  2 09:27:49 2021 daemon.notice wpa_supplicant[1936]: wlan0: MESH-SAE-AUTH-FAILURE addr=50:d4:f7:15:1f:0a
Wed Jun  2 09:28:07 2021 daemon.notice wpa_supplicant[1936]: wlan0: MESH-SAE-AUTH-FAILURE addr=50:d4:f7:15:1f:0a
Wed Jun  2 09:28:07 2021 daemon.notice wpa_supplicant[1936]: wlan0: MESH-SAE-AUTH-BLOCKED addr=50:d4:f7:15:1f:0a duration=300

Message is (can be...) generated because of timeout, but also because of ... I don't know ... broken frames?

mickeyreg avatar Jun 07 '21 18:06 mickeyreg

I have tested C2 with 19.07.7, also 19.07.6 and 19.07.5 does not work with SAE. But all versions have now newer versions of wolfssl library, than mentioned above.

I have tested also C7 v5 with 19.07 SNAPSHOT - works with authentication without problems.

So u took the latest repository stand ? Right?

djStolen avatar Jun 07 '21 20:06 djStolen