nuki_hub icon indicating copy to clipboard operation
nuki_hub copied to clipboard

Bluetooth stack issues (was: No updates in HA despite network connection / MQTT reconnect loop)

Open mundschenk-at opened this issue 1 year ago • 40 comments

This morning I noticed a peculiar phenomenon while I was leaving for work: The lock state in HA did not change, yet the ESP web interface was reachable by Wifi and told me it was connected to MQTT (and the lock) just fine. When I came home and wanted to check more deeply, I noticed the state updating was working again.

Here are my restart settings: image

The last time the lock entity in HA worked was Saturday 18:34 (unlock/lock event pair). (There was a lock event recorded by HA on Sunday 22:11, but that was just because I restarted HA at that time.)

According to Mosquitto, the last login by the ESP user yesterday was 20:14, replacing a still existing connection. Today at 08:21, the client closed the connection and opened a new one, and again at 08:24, and yet again at 08:40, and at 08:45. (Interestingly, there was a 3 second delay between closing and opening the new connection this time.)

Again at 08:47, also with the 3 second delay, and yet again 09:07 (no delay). Another fast reconnect at 09:12, at 09:23, 09:28, 09:43, 09:48, 09:50, and then finally at 10:20. No reconnects since then, and HA has recorded the lock/unlock events correctly after that.

I am not sure what was going there, but it sure looks weird. I've looked at the logs of the AP the ESP was connected to and there only three events in the relevant time frame:

Sun Aug  7 20:14:27 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> IEEE 802.11: authenticated
Sun Aug  7 20:14:28 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> IEEE 802.11: associated (aid 7)
Sun Aug  7 20:14:28 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> RADIUS: starting accounting session <session1>
Sun Aug  7 20:14:28 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> WPA: pairwise key handshake completed (RSN)
...
Mon Aug  8 06:47:42 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> WPA: group key handshake completed (RSN)
...
Mon Aug  8 12:25:23 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> IEEE 802.11: authenticated
Mon Aug  8 12:25:24 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> IEEE 802.11: associated (aid 7)
Mon Aug  8 12:25:24 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> RADIUS: starting accounting session <session2>
Mon Aug  8 12:25:24 2022 daemon.info hostapd: if-iot: STA <nuki_hub_mac> WPA: pairwise key handshake completed (RSN)

So the MQTT reconnect on Sunday 20:14 was related to a WiFi reconnect, probably after a reboot due to network timeouts (I fiddled with some settings at that time). The numerous reconnects today don't seem to have anything to do with the network itself.

mundschenk-at avatar Aug 08 '22 17:08 mundschenk-at

Sorry I'm not really sure what to do about this, and the ticket is open for a long time. Is this kind of behavior still happening? A new Arduino Core for ESP32 was just released (v2.0.5), which includes fixes from Espressif. This may or may not help.

technyon avatar Sep 17 '22 09:09 technyon

I have not noticed it again, but I've since changed my restart settings: 8F41A279-61D6-404D-A0BD-255646BA2C22

What might help for diagnosing this and similar issues would be some way to extract logging information remotely (either ex-post for the last n lines or, more likely, by supporting some kind of remote console).

mundschenk-at avatar Sep 17 '22 10:09 mundschenk-at

I've updated the core to 2.0.5, could you give the attached binary a try?

For fixes in 2.0.5 see here: https://github.com/espressif/arduino-esp32/releases/tag/2.0.5

nuki_hub_core_2.0.5.zip

technyon avatar Sep 17 '22 16:09 technyon

I can try the binary, but I'm not sure if I can really test anything (except wait and see if the behavior comes back).

mundschenk-at avatar Sep 17 '22 16:09 mundschenk-at

Since updating the firmware, my Mosquitto log has quite a lot of malformed packet errors (unfortunately, I don't have a log from before, but I never noticed them before, certainly not in such a large number).

2022-09-17 22:55:11: Client <nukihost> disconnected due to malformed packet.
2022-09-17 22:55:11: New connection from <ip>:52751 on port <port>.
2022-09-17 22:55:11: New client connected from <ip>:52751 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-17 23:00:38: Saving in-memory database to /data//mosquitto.db.
2022-09-17 23:30:39: Saving in-memory database to /data//mosquitto.db.
2022-09-18 00:00:40: Saving in-memory database to /data//mosquitto.db.
2022-09-18 00:14:08: Client <nukihost> disconnected due to malformed packet.
2022-09-18 00:14:08: New connection from <ip>:52752 on port <port>.
2022-09-18 00:14:08: New client connected from <ip>:52752 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 00:14:11: Client <nukihost> disconnected due to malformed packet.
2022-09-18 00:14:11: New connection from <ip>:52753 on port <port>.
2022-09-18 00:14:11: New client connected from <ip>:52753 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 00:19:38: Client <nukihost> disconnected due to malformed packet.
2022-09-18 00:19:38: New connection from <ip>:52754 on port <port>.
2022-09-18 00:19:38: New client connected from <ip>:52754 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 00:30:41: Saving in-memory database to /data//mosquitto.db.
2022-09-18 01:00:42: Saving in-memory database to /data//mosquitto.db.
2022-09-18 01:08:02: Bad client <nukihost> sending multiple CONNECT messages.
2022-09-18 01:08:02: Client <nukihost> disconnected due to protocol error.
2022-09-18 01:08:02: New connection from <ip>:52755 on port <port>.
2022-09-18 01:08:02: New client connected from <ip>:52755 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 01:30:43: Saving in-memory database to /data//mosquitto.db.
2022-09-18 01:56:59: Client <nukihost> disconnected due to protocol error.
2022-09-18 01:56:59: New connection from <ip>:52756 on port <port>.
2022-09-18 01:56:59: New client connected from <ip>:52756 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 01:58:02: Client <nukihost> disconnected due to malformed packet.
2022-09-18 01:58:02: New connection from <ip>:52757 on port <port>.
2022-09-18 01:58:02: New client connected from <ip>:52757 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 02:00:44: Saving in-memory database to /data//mosquitto.db.
2022-09-18 02:27:14: Client <nukihost> disconnected due to malformed packet.
2022-09-18 02:27:14: New connection from <ip>:52758 on port <port>.
2022-09-18 02:27:14: New client connected from <ip>:52758 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 02:30:45: Saving in-memory database to /data//mosquitto.db.
2022-09-18 03:00:46: Saving in-memory database to /data//mosquitto.db.
2022-09-18 03:11:20: Client <nukihost> disconnected due to protocol error.
2022-09-18 03:11:21: New connection from <ip>:52759 on port <port>.
2022-09-18 03:11:21: New client connected from <ip>:52759 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 03:24:20: Client <nukihost> disconnected due to malformed packet.
2022-09-18 03:24:21: New connection from <ip>:52760 on port <port>.
2022-09-18 03:24:21: New client connected from <ip>:52760 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 03:29:26: Client <nukihost> disconnected due to malformed packet.
2022-09-18 03:29:27: New connection from <ip>:52761 on port <port>.
2022-09-18 03:29:27: New client connected from <ip>:52761 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 03:30:47: Saving in-memory database to /data//mosquitto.db.
2022-09-18 03:56:29: Client <nukihost> disconnected due to malformed packet.
2022-09-18 03:56:29: New connection from <ip>:52762 on port <port>.
2022-09-18 03:56:29: New client connected from <ip>:52762 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 04:00:48: Saving in-memory database to /data//mosquitto.db.
2022-09-18 04:12:29: Client <nukihost> disconnected due to malformed packet.
2022-09-18 04:12:29: New connection from <ip>:52763 on port <port>.
2022-09-18 04:12:29: New client connected from <ip>:52763 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 04:30:49: Saving in-memory database to /data//mosquitto.db.
2022-09-18 04:45:32: Client <nukihost> disconnected due to protocol error.
2022-09-18 04:45:32: New connection from <ip>:52764 on port <port>.
2022-09-18 04:45:32: New client connected from <ip>:52764 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 05:00:50: Saving in-memory database to /data//mosquitto.db.
2022-09-18 05:30:51: Saving in-memory database to /data//mosquitto.db.
2022-09-18 06:00:52: Saving in-memory database to /data//mosquitto.db.
2022-09-18 06:29:08: Client <nukihost> disconnected due to malformed packet.
2022-09-18 06:29:09: New connection from <ip>:52765 on port <port>.
2022-09-18 06:29:09: New client connected from <ip>:52765 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 06:30:53: Saving in-memory database to /data//mosquitto.db.
2022-09-18 07:00:54: Saving in-memory database to /data//mosquitto.db.
2022-09-18 07:08:35: Client <nukihost> disconnected due to malformed packet.
2022-09-18 07:08:36: New connection from <ip>:52766 on port <port>.
2022-09-18 07:08:36: New client connected from <ip>:52766 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 07:30:55: Saving in-memory database to /data//mosquitto.db.
2022-09-18 08:00:56: Saving in-memory database to /data//mosquitto.db.
2022-09-18 08:27:41: Client <nukihost> disconnected due to malformed packet.
2022-09-18 08:27:42: New connection from <ip>:52767 on port <port>.
2022-09-18 08:27:42: New client connected from <ip>:52767 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 08:29:51: Client <nukihost> disconnected due to malformed packet.
2022-09-18 08:29:51: New connection from <ip>:52768 on port <port>.
2022-09-18 08:29:51: New client connected from <ip>:52768 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 08:30:57: Saving in-memory database to /data//mosquitto.db.
2022-09-18 08:39:51: Warning: Received PUBREC from <nukihost> for an unknown packet identifier 12356.
2022-09-18 08:39:51: Bad socket read/write on client <nukihost>: Invalid function arguments provided.
2022-09-18 08:39:51: New connection from <ip>:52769 on port <port>.
2022-09-18 08:39:51: New client connected from <ip>:52769 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 09:00:58: Saving in-memory database to /data//mosquitto.db.
2022-09-18 09:27:33: Client <nukihost> disconnected due to malformed packet.
2022-09-18 09:27:33: New connection from <ip>:52770 on port <port>.
2022-09-18 09:27:33: New client connected from <ip>:52770 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-18 09:30:59: Saving in-memory database to /data//mosquitto.db.
2022-09-18 09:44:42: Client <nukihost> disconnected due to malformed packet.

(No functional problem, yet. I've no set the restart timeout to -1 again, so let's see what's happening in the next few days.)

mundschenk-at avatar Sep 18 '22 11:09 mundschenk-at

Thanks. Just to be sure, could you get a log with the released firmware? If this does indeed happen with with the new core, I'd postpone updating.

technyon avatar Sep 18 '22 12:09 technyon

Sure. Is it possible that there are underlying problems that always existed but only with the new core manifest as explicit errors?

mundschenk-at avatar Sep 18 '22 14:09 mundschenk-at

Just to make sure: Should this test release have different version number or just 5.15? (If it should show a different name, I've probably been on stock all along.)

mundschenk-at avatar Sep 18 '22 14:09 mundschenk-at

It's still 5.15, but I added -core-v2.0.5.

Regarding the previous question, only Espressif knows.

technyon avatar Sep 18 '22 14:09 technyon

OK, so the errors were from stock 5.15. I'll try flashing the new binary again.

mundschenk-at avatar Sep 18 '22 14:09 mundschenk-at

Huh, so I have hard time doing an OTA. Somehow, it appears the network stack is wonky (or my ESP32 has decided it is a good time to die). Ping veers into several seconds every few packets, and sometimes even times out even when in the same room as the router. I've skipped several version (roughl 5.9 to 5.14 IIRC, but I'm not completely sure which was the earliest). Before 5.15 OTA worked fine (after you cleared up the initial issues with credentials).

Anyway, how can I install the test version via esptool.py, can I just use the other .bin files from the normal release?

mundschenk-at avatar Sep 18 '22 15:09 mundschenk-at

Yes, the other files didn't change. Just replace nuki_hub.bin and that's it.

technyon avatar Sep 18 '22 16:09 technyon

P.S.: Do you also have network problems with the "official" 5.15 ?

technyon avatar Sep 19 '22 08:09 technyon

P.S.: Do you also have network problems with the "official" 5.15 ?

Everything above was with the official release, before updating to the new core. Since the update, the Mosquitto log still looks the same (e.g. frequent protocol errors), but the integration seems to work fine. Bluetooth and WiFi RSSI seem to be much more consistent than before. I'll do a ping test and post the logs later.

mundschenk-at avatar Sep 19 '22 16:09 mundschenk-at

Much better ping after flashing 5.15#core_2.0.5 (still dropping a packet here and there, but about 10 times less and ping is mostly und 100 ms instead of 2 to 5 seconds).

Here's the current Mosquitto log:

2022-09-19 03:53:01: New connection from <ip>:58404 on port <port>.
2022-09-19 03:53:01: New client connected from <ip>:58404 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 04:01:36: Saving in-memory database to /data//mosquitto.db.
2022-09-19 04:04:15: Client <nukihost> disconnected due to malformed packet.
2022-09-19 04:04:16: New connection from <ip>:58405 on port <port>.
2022-09-19 04:04:16: New client connected from <ip>:58405 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 04:26:07: Client <nukihost> disconnected due to malformed packet.
2022-09-19 04:26:07: New connection from <ip>:58406 on port <port>.
2022-09-19 04:26:07: New client connected from <ip>:58406 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 04:31:37: Saving in-memory database to /data//mosquitto.db.
2022-09-19 05:01:38: Saving in-memory database to /data//mosquitto.db.
2022-09-19 05:31:39: Saving in-memory database to /data//mosquitto.db.
2022-09-19 06:01:40: Saving in-memory database to /data//mosquitto.db.
2022-09-19 06:31:41: Saving in-memory database to /data//mosquitto.db.
2022-09-19 06:58:22: Client <nukihost> disconnected due to malformed packet.
2022-09-19 06:58:22: New connection from <ip>:58407 on port <port>.
2022-09-19 06:58:22: New client connected from <ip>:58407 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 06:59:13: Client <nukihost> disconnected due to protocol error.
2022-09-19 06:59:13: New connection from <ip>:58408 on port <port>.
2022-09-19 06:59:13: New client connected from <ip>:58408 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 07:01:42: Saving in-memory database to /data//mosquitto.db.
2022-09-19 07:04:31: Client <nukihost> disconnected due to malformed packet.
2022-09-19 07:04:31: New connection from <ip>:58409 on port <port>.
2022-09-19 07:04:31: New client connected from <ip>:58409 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 07:31:43: Saving in-memory database to /data//mosquitto.db.
2022-09-19 08:01:44: Saving in-memory database to /data//mosquitto.db.
2022-09-19 08:24:58: Invalid QoS in PUBLISH from <nukihost>, disconnecting.
2022-09-19 08:24:58: Client <nukihost> disconnected due to malformed packet.
2022-09-19 08:24:58: New connection from <ip>:58410 on port <port>.
2022-09-19 08:24:58: New client connected from <ip>:58410 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 08:31:45: Saving in-memory database to /data//mosquitto.db.
2022-09-19 09:01:46: Saving in-memory database to /data//mosquitto.db.
2022-09-19 09:26:43: Client <nukihost> disconnected due to malformed packet.
2022-09-19 09:26:43: New connection from <ip>:58411 on port <port>.
2022-09-19 09:26:43: New client connected from <ip>:58411 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 09:31:47: Saving in-memory database to /data//mosquitto.db.
2022-09-19 10:01:48: Saving in-memory database to /data//mosquitto.db.
2022-09-19 10:31:49: Saving in-memory database to /data//mosquitto.db.
2022-09-19 10:34:34: Client <nukihost> disconnected due to malformed packet.
2022-09-19 10:34:34: New connection from <ip>:58412 on port <port>.
2022-09-19 10:34:34: New client connected from <ip>:58412 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 10:37:34: Client <nukihost> disconnected due to malformed packet.
2022-09-19 10:37:34: New connection from <ip>:58413 on port <port>.
2022-09-19 10:37:34: New client connected from <ip>:58413 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:01:50: Saving in-memory database to /data//mosquitto.db.
2022-09-19 11:24:46: Client <nukihost> disconnected due to protocol error.
2022-09-19 11:24:46: New connection from <ip>:58414 on port <port>.
2022-09-19 11:24:46: New client connected from <ip>:58414 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:31:51: Saving in-memory database to /data//mosquitto.db.
2022-09-19 11:40:13: Client <nukihost> disconnected due to malformed packet.
2022-09-19 11:40:13: New connection from <ip>:58415 on port <port>.
2022-09-19 11:40:13: New client connected from <ip>:58415 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:56:01: Client <nukihost> disconnected due to malformed packet.
2022-09-19 11:56:01: New connection from <ip>:58416 on port <port>.
2022-09-19 11:56:01: New client connected from <ip>:58416 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 11:56:04: Client <nukihost> disconnected due to malformed packet.
2022-09-19 11:56:04: New connection from <ip>:58417 on port <port>.
2022-09-19 11:56:04: New client connected from <ip>:58417 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 12:01:52: Saving in-memory database to /data//mosquitto.db.
2022-09-19 12:31:53: Saving in-memory database to /data//mosquitto.db.
2022-09-19 13:01:54: Saving in-memory database to /data//mosquitto.db.
2022-09-19 13:31:55: Saving in-memory database to /data//mosquitto.db.
2022-09-19 13:34:19: Client <nukihost> disconnected due to malformed packet.
2022-09-19 13:34:19: New connection from <ip>:58418 on port <port>.
2022-09-19 13:34:19: New client connected from <ip>:58418 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 13:56:08: Client <nukihost> disconnected due to malformed packet.
2022-09-19 13:56:08: New connection from <ip>:58419 on port <port>.
2022-09-19 13:56:08: New client connected from <ip>:58419 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 14:01:56: Saving in-memory database to /data//mosquitto.db.
2022-09-19 14:06:39: Client <nukihost> disconnected due to malformed packet.
2022-09-19 14:06:39: New connection from <ip>:58420 on port <port>.
2022-09-19 14:06:39: New client connected from <ip>:58420 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 14:31:57: Saving in-memory database to /data//mosquitto.db.
2022-09-19 14:39:15: Client <nukihost> disconnected due to malformed packet.
2022-09-19 14:39:15: New connection from <ip>:58421 on port <port>.
2022-09-19 14:39:15: New client connected from <ip>:58421 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 15:01:58: Saving in-memory database to /data//mosquitto.db.
2022-09-19 15:03:01: Client <nukihost> disconnected due to malformed packet.
2022-09-19 15:03:01: New connection from <ip>:58422 on port <port>.
2022-09-19 15:03:01: New client connected from <ip>:58422 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 15:31:59: Saving in-memory database to /data//mosquitto.db.
2022-09-19 15:39:13: Client <nukihost> disconnected due to malformed packet.
2022-09-19 15:39:13: New connection from <ip>:58423 on port <port>.
2022-09-19 15:39:13: New client connected from <ip>:58423 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 16:02:00: Saving in-memory database to /data//mosquitto.db.
2022-09-19 16:32:01: Saving in-memory database to /data//mosquitto.db.
2022-09-19 17:02:02: Saving in-memory database to /data//mosquitto.db.
2022-09-19 17:02:34: Client <nukihost> disconnected due to malformed packet.
2022-09-19 17:02:34: New connection from <ip>:58424 on port <port>.
2022-09-19 17:02:34: New client connected from <ip>:58424 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 17:05:58: Client <nukihost> disconnected due to malformed packet.
2022-09-19 17:05:58: New connection from <ip>:58425 on port <port>.
2022-09-19 17:05:58: New client connected from <ip>:58425 as <nukihost> (p2, c1, k15, u'<user>').
2022-09-19 17:22:34: Client <nukihost> disconnected due to malformed packet.

No real change here. I am assuming the default log level of the Mosquitto container changed during a recent update, so that might explain why those malformed packets weren't there when I checked the log with earlier versions.

mundschenk-at avatar Sep 19 '22 18:09 mundschenk-at

So this is the Bluetooth signal of the opener from yesterday till now. The huge drop in the middle is when I took the ESP32 elsewhere for flashing.

image

Not sure about the small drop this afternoon. It coincides when a member of the household arrived home, but I don't have an explanation for the change.

mundschenk-at avatar Sep 19 '22 18:09 mundschenk-at

Thanks, I'll prepare a new Release with core 2.0.5. Interesting there's still that much to fix about Wifi, considering it's one of the main features of the chip.

technyon avatar Sep 19 '22 21:09 technyon

I think I've found the reason for the unresponsive device and it has nothing to do with either WiFi connectivity or MQTT. This diagram shows the Bluetooth RSSI for both the Lock and the Opener: image

The horizontal line between roughly 6:42 and 9:55 is the same for both devices - there were no new measurements coming in via MQTT, but in this diagram, adding WiFi RSSI you can see that device was always online.

image

Apparently, the BT stack got into a weird state and stopped reporting anything until some kind of reset fixed things. (Probably a reboot of the ESP32, it's hard to tell from the graph, but there is a one minute flatline on the WiFi just before BT reporting resumes.)

image

mundschenk-at avatar Sep 20 '22 19:09 mundschenk-at

Very strange. Is this the old or the new core?

As a workaround I could reset the ESP when no beacons are received and thus the bluetooth RSSI isn't updated, but I don't really like such workarounds.

technyon avatar Sep 21 '22 07:09 technyon

P.S.: Might be worth trying to update the ble lib:

https://github.com/h2zero/NimBLE-Arduino

There's a new release from end of june

technyon avatar Sep 21 '22 07:09 technyon

Very strange. Is this the old or the new core?

As a workaround I could reset the ESP when no beacons are received and thus the bluetooth RSSI isn't updated, but I don't really like such workarounds.

The graph is from the new core, but I'm by now 100 % certain that that's what happened before. I found a similar episode in the old logs from a few weeks ago and it happened again today.

Updating the BLE library certainly sounds like a good idea, but so does a failsafe trigger similar the network timers. I agree that finding the root cause would be better, but that may not be possible without some extensive long-time logging (which is difficult in operating conditions - though it might be possible if you could implement a wireless way to access the logs).

mundschenk-at avatar Sep 21 '22 16:09 mundschenk-at

Hi,

sorry I was rather busy the last few days. I'll look into it, let's first try updating the bluetooth library.

technyon avatar Sep 23 '22 09:09 technyon

The attached binary is compiled with

  • core 2.05
  • nimble 1.40

Could you give it another try? If this doesn't fix it, I'll add another restart option ... and yes remote logging would help indeed.

nuki_hub_nimble-1.40.zip

technyon avatar Sep 23 '22 18:09 technyon

I've updated the ESP using the new binary (had some issues with it - seems the OTA is not as smooth at was before for me, feels a bit like before the issue with credentials was fixed. (There's a chance it correlates with the device switching to a "bad" access point and sticking to it between reboots, and it's not since updating to the original 5.15).

I will monitor the issue, but it will take at least three days to have some confidence it's not occurring anymore. I've added an automation to HA to trigger a restart via MQTT if the RSSI value for both Lock and Opener does not change for over 60 seconds. (The automation also logs an error and sends me a notification.)

mundschenk-at avatar Sep 23 '22 19:09 mundschenk-at

ok, thanks for testing. I'd like to release since it seems to improve WiFi issues, let's just make sure it doesn't make things worse.

Yes an automatic restart from HA is possible of course, it should work in the meantime. It would still be better to have the firmware do it, it can probably detect it a bit faster and more reliable, and makes the feature independent from the frontend used.

technyon avatar Sep 23 '22 19:09 technyon

Unfortunately, I can already tell you that the new library does not help (and that my automation also has some kind of error because it did not trigger). Had a BT dropout during the night.

mundschenk-at avatar Sep 24 '22 07:09 mundschenk-at

Since I don't really have an explanation, let's try the automatic reboot. The attached binary restarts when no beacons have been received for ~2 minutes. Can you check if it improves things? If yes, I'd make the timeout configurable.

nuki_hub_ble_reboot.zip

technyon avatar Sep 24 '22 20:09 technyon

Thanks. What's the version string supposed to be on that one (to make sure the flashing took)?

mundschenk-at avatar Sep 25 '22 07:09 mundschenk-at

I think I didn't change it. I'll just use incremental numbers for this from now on to make it easier.

I've recompiled with an updated version string:

5.15_core-2.05_nimble-1.40#01

nuki_hub#01.zip

technyon avatar Sep 25 '22 08:09 technyon

Well, weirdly enough the drop out last night was NOT the same as the previous BT dropouts. Instead, there apparently where no MQTT changes transmitted to HA (i.e. the WiFi RSSI also dropped out). I don't see anything in the router logs to suggest that WiFi actually dropped out (there are two reconnections at the beginning and the end of the four hour drop out period, and a DHCP request in the middle, so the network must have been up). Unfortunately, the Mosquitto logs have already rolled over.

Anyway, I'll monitor this version during the next week, let's see if there are any more (non-WiFi/MQTT) drop-outs.

mundschenk-at avatar Sep 25 '22 12:09 mundschenk-at