esp32-ble2mqtt icon indicating copy to clipboard operation
esp32-ble2mqtt copied to clipboard

Restarts every 5 minutes and eventually becomes unresponsive.

Open dullage opened this issue 3 years ago • 17 comments

Love this project! After trying and failing to build something custom for my use case (controlling a BLE motorised blind) I was very excited to find this project (and very grateful for your work).

After getting everything configured, built and flashed it worked great! Unfortunately, though, it only seems to work great for a short while, then either one of two things happen:

  1. The ESP32 restarts. Or at least the firmware restarts as I can see the history of Uptime values never seem to get above 360 (6 minutes).
  2. It becomes completely unresponsive.

I used remote logging to capture a few of these lock ups and in each case I see a very similar pattern. The log below shows a complete history from initial connection to lock-up:

2021-02-01 20:43:58.959820 (BLE2MQTT-AAB8.lan): I (8991) BLE2MQTT: Connected to the network, connecting to MQTT
2021-02-01 20:43:59.891463 (BLE2MQTT-AAB8.lan): I (9001) MQTT: Connecting MQTT client
2021-02-01 20:43:59.893458 (BLE2MQTT-AAB8.lan): I (9001) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
2021-02-01 20:43:59.902436 (BLE2MQTT-AAB8.lan): I (9941) BLE2MQTT: Discovered BLE device: 2d:74:aa:f2:6c:a2 (RSSI: -68), not connecting
2021-02-01 20:44:00.109021 (BLE2MQTT-AAB8.lan): I (10501) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c 
(RSSI: -79), connecting
2021-02-01 20:44:00.222717 (BLE2MQTT-AAB8.lan): W (10611) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-01 20:44:00.290536 (BLE2MQTT-AAB8.lan): E (10621) BLE: Open failed, status = 0x85
2021-02-01 20:44:00.291534 (BLE2MQTT-AAB8.lan): I (10641) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c 
(RSSI: -78), connecting
2021-02-01 20:44:00.295523 (BLE2MQTT-AAB8.lan): I (10641) BLE2MQTT: Discovered BLE device: 23:b7:51:3d:dd:aa 
(RSSI: -81), not connecting
2021-02-01 20:44:00.388714 (BLE2MQTT-AAB8.lan): W (10751) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-01 20:44:00.389712 (BLE2MQTT-AAB8.lan): E (10751) BLE: Open failed, status = 0x85
2021-02-01 20:44:00.708331 (BLE2MQTT-AAB8.lan): I (11091) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c 
(RSSI: -85), connecting
2021-02-01 20:44:00.923229 (BLE2MQTT-AAB8.lan): W (11261) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-01 20:44:00.923229 (BLE2MQTT-AAB8.lan): E (11261) BLE: Open failed, status = 0x85
2021-02-01 20:44:00.923229 (BLE2MQTT-AAB8.lan): I (11281) BLE2MQTT: Discovered BLE device: 6a:97:25:03:7a:4c 
(RSSI: -85), not connecting
2021-02-01 20:44:00.933204 (BLE2MQTT-AAB8.lan): I (11291) BLE2MQTT: Discovered BLE device: 56:12:c5:68:3f:ac 
(RSSI: -64), not connecting
2021-02-01 20:44:01.241871 (BLE2MQTT-AAB8.lan): I (11601) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c 
(RSSI: -82), connecting
2021-02-01 20:44:01.898856 (BLE2MQTT-AAB8.lan): I (12211) BLE2MQTT: Discovered BLE device: 70:73:cb:ea:eb:bb 
(RSSI: -86), not connecting
2021-02-01 20:44:02.398413 (BLE2MQTT-AAB8.lan): I (12771) BLE2MQTT: Discovered BLE device: 56:36:66:89:28:76 
(RSSI: -90), not connecting
2021-02-01 20:44:03.316879 (BLE2MQTT-AAB8.lan): I (13571) BLE2MQTT: Discovered BLE device: 63:57:cd:c3:9a:d3 
(RSSI: -91), not connecting
2021-02-01 20:44:03.317877 (BLE2MQTT-AAB8.lan): I (13651) BLE2MQTT: Discovered BLE device: 40:40:40:12:80:b2 
(RSSI: -84), not connecting
2021-02-01 20:44:04.158517 (BLE2MQTT-AAB8.lan): I (14451) BLE2MQTT: Discovered BLE device: 73:c0:c5:d9:0a:07 
(RSSI: -78), not connecting
2021-02-01 20:44:09.169852 (BLE2MQTT-AAB8.lan): I (19501) BLE2MQTT: Connected to device: de:66:92:69:03:8c, scanning
2021-02-01 20:44:09.333787 (BLE2MQTT-AAB8.lan): E (19701) BT_APPL: service change write ccc failed
2021-02-01 20:44:11.916087 (BLE2MQTT-AAB8.lan): I (22151) BLE2MQTT: Publishing: de:66:92:69:03:8c/GenericAccess/DeviceName = S
2021-02-01 20:44:12.168417 (BLE2MQTT-AAB8.lan): I (22511) BLE2MQTT: Publishing: de:66:92:69:03:8c/GenericAccess/Appearance = 0
2021-02-01 20:44:12.301480 (BLE2MQTT-AAB8.lan): I (22631) BLE2MQTT: Publishing: de:66:92:69:03:8c/GenericAccess/PeripheralPreferredConnectionParameters = 16,32,0,80
2021-02-01 20:44:12.628205 (BLE2MQTT-AAB8.lan): I (22991) BLE2MQTT: Publishing: de:66:92:69:03:8c/DeviceInformation/ManufacturerNameString = WazombiLabs
2021-02-01 20:44:12.956356 (BLE2MQTT-AAB8.lan): I (23311) BLE2MQTT: Publishing: de:66:92:69:03:8c/DeviceInformation/SerialNumberString = R-0103-kristjan-202010151321-202c1e9-6.1
2021-02-01 20:44:13.191649 (BLE2MQTT-AAB8.lan): I (23471) BLE2MQTT: Publishing: de:66:92:69:03:8c/DeviceInformation/HardwareRevisionString = BLINDY_V9
2021-02-01 20:44:14.047338 (BLE2MQTT-AAB8.lan): I (23631) BLE2MQTT: Publishing: de:66:92:69:03:8c/DeviceInformation/FirmwareRevisionString = 5baefa6
2021-02-01 20:44:14.047338 (BLE2MQTT-AAB8.lan): I (23751) BLE2MQTT: Publishing: de:66:92:69:03:8c/DeviceInformation/SoftwareRevisionString = v2.2.3
2021-02-01 20:44:14.048336 (BLE2MQTT-AAB8.lan): I (23871) BLE2MQTT: Publishing: de:66:92:69:03:8c/BatteryService/BatteryLevel = 66
2021-02-01 20:44:14.048336 (BLE2MQTT-AAB8.lan): I (24191) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001554-b87f-490c-92cb-11ba5ea5167c/00001555-b87f-490c-92cb-11ba5ea5167c = 29,104,24,96
2021-02-01 20:44:14.342958 (BLE2MQTT-AAB8.lan): I (24691) BLE2MQTT: Discovered BLE device: 1c:40:31:7d:d6:96 
(RSSI: -87), not connecting
2021-02-01 20:44:14.449847 (BLE2MQTT-AAB8.lan): I (24751) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001525-b87f-490c-92cb-11ba5ea5167c = 100,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2021-02-01 20:44:14.870529 (BLE2MQTT-AAB8.lan): I (25071) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001526-b87f-490c-92cb-11ba5ea5167c = 100
2021-02-01 20:44:14.877077 (BLE2MQTT-AAB8.lan): I (25271) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001527-b87f-490c-92cb-11ba5ea5167c = 51,1,0,55,55,55,55,55,55,55,55,55,55,55,55,55,55
2021-02-01 20:44:15.218825 (BLE2MQTT-AAB8.lan): I (25391) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001528-b87f-490c-92cb-11ba5ea5167c = 48,51,1,0,64,56,0,0,127,100,129,0,0,0,0,50,0,0
2021-02-01 20:44:15.350292 (BLE2MQTT-AAB8.lan): I (25711) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001529-b87f-490c-92cb-11ba5ea5167c = 169
2021-02-01 20:44:16.134915 (BLE2MQTT-AAB8.lan): I (25991) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001530-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-01 20:44:16.135466 (BLE2MQTT-AAB8.lan): I (26111) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001531-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-01 20:44:16.135972 (BLE2MQTT-AAB8.lan): I (26311) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001533-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-01 20:44:16.135972 (BLE2MQTT-AAB8.lan): I (26431) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/0000ba71-b87f-490c-92cb-11ba5ea5167c = 144,1
2021-02-01 20:44:16.585717 (BLE2MQTT-AAB8.lan): I (26711) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/0000ba72-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-01 20:44:16.930712 (BLE2MQTT-AAB8.lan): I (27031) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001890-b87f-490c-92cb-11ba5ea5167c/00001891-b87f-490c-92cb-11ba5ea5167c = 177
2021-02-01 20:44:16.935695 (BLE2MQTT-AAB8.lan): I (27271) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001890-b87f-490c-92cb-11ba5ea5167c/00001892-b87f-490c-92cb-11ba5ea5167c = 77,97,115,116,101,114,32,66,101,100,114,111,111,109,48,0
2021-02-01 20:44:17.199531 (BLE2MQTT-AAB8.lan): I (27471) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001890-b87f-490c-92cb-11ba5ea5167c/00001893-b87f-490c-92cb-11ba5ea5167c = 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2021-02-01 20:44:17.365005 (BLE2MQTT-AAB8.lan): I (27711) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001890-b87f-490c-92cb-11ba5ea5167c/00001894-b87f-490c-92cb-11ba5ea5167c = 0,0,0,0
2021-02-01 20:44:17.825591 (BLE2MQTT-AAB8.lan): I (28071) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001890-b87f-490c-92cb-11ba5ea5167c/00001895-b87f-490c-92cb-11ba5ea5167c = 68,69,45,54,54,45,57,50,45,54,57,45,48,51,45,56,67
2021-02-01 20:44:17.962773 (BLE2MQTT-AAB8.lan): I (28271) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001890-b87f-490c-92cb-11ba5ea5167c/00001896-b87f-490c-92cb-11ba5ea5167c = 4,0,0,0,76,175,22,3,103,62,0,0,0,60,0,0,0,0,0,0
2021-02-01 20:44:18.420791 (BLE2MQTT-AAB8.lan): I (28771) BLE2MQTT: Discovered BLE device: 6a:4a:fa:7a:2f:49 
(RSSI: -99), not connecting
2021-02-01 20:44:23.949993 (BLE2MQTT-AAB8.lan): I (34291) BLE2MQTT: Discovered BLE device: 5d:53:0d:aa:af:63 
(RSSI: -91), not connecting
2021-02-01 20:45:16.891550 (BLE2MQTT-AAB8.lan): I (87251) BLE2MQTT: Discovered BLE device: d3:07:42:2d:15:67 
(RSSI: -70), not connecting
2021-02-01 20:46:00.632061 (BLE2MQTT-AAB8.lan): I (131021) BLE2MQTT: Discovered BLE device: 09:88:72:89:cc:a6 (RSSI: -71), not connecting
2021-02-01 20:46:04.516954 (BLE2MQTT-AAB8.lan): I (134851) BLE2MQTT: Discovered BLE device: 5a:0b:7e:0d:06:ed (RSSI: -70), not connecting
2021-02-01 20:48:00.830524 (BLE2MQTT-AAB8.lan): W (251151) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003b
2021-02-01 20:48:00.830524 (BLE2MQTT-AAB8.lan): I (251161) BLE: Connection closed, reason = 0x3b
2021-02-01 20:48:00.830524 (BLE2MQTT-AAB8.lan): I (251161) BLE2MQTT: Disconnected from device: de:66:92:69:03:8c
2021-02-01 20:48:01.135219 (BLE2MQTT-AAB8.lan): I (251501) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -80), connecting
2021-02-01 20:48:01.241934 (BLE2MQTT-AAB8.lan): W (251621) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-01 20:48:01.241934 (BLE2MQTT-AAB8.lan): E (251621) BLE: Open failed, status = 0x85
2021-02-01 20:48:01.258888 (BLE2MQTT-AAB8.lan): I (251641) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -81), connecting
2021-02-01 20:48:03.595035 (BLE2MQTT-AAB8.lan): I (253951) BLE2MQTT: Discovered BLE device: 7b:8e:67:36:e1:7e (RSSI: -85), not connecting
2021-02-01 20:48:06.176960 (BLE2MQTT-AAB8.lan): I (256501) BLE2MQTT: Connected to device: de:66:92:69:03:8c, 
scanning
2021-02-01 20:48:06.884011 (BLE2MQTT-AAB8.lan): I (257221) BLE2MQTT: Publishing: de:66:92:69:03:8c/GenericAccess/DeviceName = S
2021-02-01 20:48:06.975766 (BLE2MQTT-AAB8.lan): I (257341) BLE2MQTT: Publishing: de:66:92:69:03:8c/GenericAccess/Appearance = 0
2021-02-01 20:48:07.121514 (BLE2MQTT-AAB8.lan): I (257501) BLE2MQTT: Publishing: de:66:92:69:03:8c/GenericAccess/PeripheralPreferredConnectionParameters = 16,32,0,80
2021-02-01 20:48:07.283083 (BLE2MQTT-AAB8.lan): E (257631) BT_APPL: service change write ccc failed
2021-02-01 20:48:07.283083 (BLE2MQTT-AAB8.lan): E (257641) BT_GATT: GATTC_Discover GATT_BUSY conn_id = 1
2021-02-01 20:48:07.285077 (BLE2MQTT-AAB8.lan): E (257641) BT_APPL: discovery on server failed
2021-02-01 20:48:07.295051 (BLE2MQTT-AAB8.lan): E (257641) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.295051 (BLE2MQTT-AAB8.lan): E (257641) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.296049 (BLE2MQTT-AAB8.lan): E (257651) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.296049 (BLE2MQTT-AAB8.lan): E (257661) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.296049 (BLE2MQTT-AAB8.lan): E (257661) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.296049 (BLE2MQTT-AAB8.lan): E (257671) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.297046 (BLE2MQTT-AAB8.lan): E (257671) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.297046 (BLE2MQTT-AAB8.lan): E (257681) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.303029 (BLE2MQTT-AAB8.lan): E (257681) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.306025 (BLE2MQTT-AAB8.lan): E (257691) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.314998 (BLE2MQTT-AAB8.lan): E (257701) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.317990 (BLE2MQTT-AAB8.lan): E (257701) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1
2021-02-01 20:48:07.323974 (BLE2MQTT-AAB8.lan): E (257711) BT_GATT: GATTC_Read GATT_BUSY conn_id = 1

This log starts after one of the restarts I mentioned. If I look at the log entries immediately above this I see the same pattern. So I believe it's the same problem each time only some of the time it's able to recover by itself and some of the time it isn't.

For reference, this is my config:

{
  "wifi": {
    "ssid": "MySSID",
    "password": "MyPassword"
  },
  "mqtt": {
    "server": {
      "host": "MyHost",
      "port": 1883
    },
    "publish": {
      "retain": true
    }
  },
  "ble": {
    "whitelist": ["de:66:92:69:03:8c"]
  },
  "log": {
    "host": "MyOtherHost",
    "port": 5000
  }
}

Built using ESP IDF 4.1.1.

Any input would be appreciated.

dullage avatar Feb 01 '21 21:02 dullage

Hey,

First off, I haven't tried v4.1.1 myself. I see they've fixed many things revolving around BLE but who knows. I suggest trying with ESP-IDV v4.1 and see if it changes anything. I see in the logs that connecting fails a lot of times and the RSSI is low. Could it be that the ESP is far away from the blinds? Any chance to try and bring it closer for another test? Finally, there are two other errors I'm not to familiar with, service change write ccc failed and discovery on server failed. I don't know if the connection is simply dropped and that causes other issues, but perhaps we can also try whitelisting only the characteristics you're interested in to lighten the load? BTW, do you know if the BLE device uses some pairing mechanism?

Anyways, let me know how it works out for you and we can continue from there.

Thanks

shmuelzon avatar Feb 01 '21 22:02 shmuelzon

Thanks for the ideas! Here's my findings so far...

I downgraded the esp-idf to 4.1 and still had exactly the same issue.

The ESP is only about 1-2m from the blind controller with a direct line of sight. Not sure why the RSSI is so low in that particular log but I've also seen the issue with an RSSI of about -72 which from what I understand isn't too bad.

As far as I know, there's no pairing mechanism. When it works it works so I'm assuming not.

Whitelisting only the characteristics I need seems to have improved the matter though. I managed to get 3 hours without the uptime resetting yesterday evening but, unfortunately, overnight it became unresponsive again.

I restarted this morning and I'm saving the log to file. So far it's been up 5 hours. If it goes down again I can check the log to see if it's the same errors.

dullage avatar Feb 03 '21 13:02 dullage

Hey,

I'm the developer of the BLE shade used here. There is no pairing at the moment. I'm not sure if adding pairing would make this software skip the discovery on every connection. But our own hub almost never does characteristic discovery and that is one of the main things that makes it fast and stable. We never change the handle layout anymore so it should be safe to just hardcode the characteristic value handles and skip discovery.

That RSSI seems pretty low for that distance but since both the ESP32 and our Nordic module have small chip antennae it could be just positioning. They are very directional and can have weird spots of low signal. But a connection should still be ok even with -90 RSSI in my experience.

The "service change write" thing could be my fault. We have to abuse the Service Change Characteristic a bit to make common phone software work right with our FW update procedure. Since we don't have BT pairing phones do discovery on every connection. But they seem to skip discovery if the connection is just restarted. Our update procedure makes the device restart and come back up with a DFU handle layout which the phones don't notice since it happens so fast they think it was just a connection reset. So we set the Service Change Characteristic to force rediscovery after every connection. It is a hack but seems to work for now.

ratsept avatar Feb 12 '21 10:02 ratsept

Hey @ratsept, thanks for taking the time and providing us with that information. This project is meant to be generic for any BLE device so I can't skip discovery, still that should be handled by Nordic's SDK and I don't think discovery would be a problem. I have seen some peripherals misbehave with the services changed characteristics where they would not properly respond to the notification requests and I usually suggest to blacklist that service so it isn't even used. Though it's the first time I've seen that error message in the ESP's logs. In any case, I don't think what you're doing is really abusing it all that much and it still should be possible to request notifications on that characteristic.

@Dullage, do you happen to have the logs from after whitelisting the characteristics?

shmuelzon avatar Feb 15 '21 19:02 shmuelzon

Thanks for your help guys. @shmuelzon - The log below is after whitelisting the characteristics as follows:

  "ble": {
    "whitelist": ["de:66:92:69:03:8c"],
    "characteristics": {
      "whitelist": [
        "00001525-b87f-490c-92cb-11ba5ea5167c",
        "00001526-b87f-490c-92cb-11ba5ea5167c",
        "00001530-b87f-490c-92cb-11ba5ea5167c",
        "00002a19-0000-1000-8000-00805f9b34fb"
      ]
    }
  }
2021-02-16 13:07:35.049928 (BLE2MQTT-AAB8.lan): I (8693) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -81), connecting
2021-02-16 13:07:35.252590 (BLE2MQTT-AAB8.lan): W (8863) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-16 13:07:35.254021 (BLE2MQTT-AAB8.lan): E (8873) BLE: Open failed, status = 0x85
2021-02-16 13:07:35.664912 (BLE2MQTT-AAB8.lan): I (9293) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -80), connecting
2021-02-16 13:07:36.900424 (BLE2MQTT-AAB8.lan): I (10533) BLE2MQTT: Discovered BLE device: 7c:bc:bc:2e:44:3b (RSSI: -85), not connecting
2021-02-16 13:07:37.816001 (BLE2MQTT-AAB8.lan): I (11333) BLE2MQTT: Discovered BLE device: 7b:b8:34:fb:64:53 (RSSI: -70), not connecting
2021-02-16 13:07:38.264564 (BLE2MQTT-AAB8.lan): I (11893) BLE2MQTT: Discovered BLE device: 02:7a:28:32:0b:2c (RSSI: -75), not connecting
2021-02-16 13:07:39.055943 (BLE2MQTT-AAB8.lan): I (12693) BLE2MQTT: Discovered BLE device: 70:73:cb:ea:eb:bb (RSSI: -64), not connecting
2021-02-16 13:07:40.266871 (BLE2MQTT-AAB8.lan): I (13893) BLE2MQTT: Discovered BLE device: 63:2e:1b:50:f8:29 (RSSI: -82), not connecting
2021-02-16 13:07:41.187834 (BLE2MQTT-AAB8.lan): I (14773) BLE2MQTT: Discovered BLE device: 70:55:33:79:5f:bf (RSSI: -93), not connecting
2021-02-16 13:07:41.935895 (BLE2MQTT-AAB8.lan): I (15573) BLE2MQTT: Discovered BLE device: 71:f0:90:97:92:14 (RSSI: -92), not connecting
2021-02-16 13:07:42.119690 (BLE2MQTT-AAB8.lan): I (15763) BLE2MQTT: Connected to device: de:66:92:69:03:8c, scanning
2021-02-16 13:07:42.326981 (BLE2MQTT-AAB8.lan): E (15963) BT_APPL: service change write ccc failed
2021-02-16 13:07:43.365727 (BLE2MQTT-AAB8.lan): I (17003) BLE2MQTT: Discovered BLE device: 75:53:aa:9f:c2:31 (RSSI: -90), not connecting
2021-02-16 13:07:43.953046 (BLE2MQTT-AAB8.lan): I (17573) BLE2MQTT: Discovered BLE device: 5f:fe:19:86:cf:79 (RSSI: -79), not connecting
2021-02-16 13:07:44.069057 (BLE2MQTT-AAB8.lan): I (17693) BLE2MQTT: Publishing: de:66:92:69:03:8c/BatteryService/BatteryLevel = 90
2021-02-16 13:07:44.470677 (BLE2MQTT-AAB8.lan): I (18073) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001525-b87f-490c-92cb-11ba5ea5167c = 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2021-02-16 13:07:44.875214 (BLE2MQTT-AAB8.lan): I (18483) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001526-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-16 13:07:44.990365 (BLE2MQTT-AAB8.lan): I (18593) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001530-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-16 13:07:48.060535 (BLE2MQTT-AAB8.lan): I (21653) BLE2MQTT: Discovered BLE device: 40:1a:f7:6f:d6:57 (RSSI: -78), not connecting
2021-02-16 13:07:57.083342 (BLE2MQTT-AAB8.lan): I (30693) BLE2MQTT: Discovered BLE device: 7a:35:4a:bd:1c:7b (RSSI: -87), not connecting
2021-02-16 13:07:59.936344 (BLE2MQTT-AAB8.lan): I (33573) BLE2MQTT: Discovered BLE device: 6e:36:da:40:79:88 (RSSI: -91), not connecting
2021-02-16 13:08:09.559822 (BLE2MQTT-AAB8.lan): I (43173) BLE2MQTT: Discovered BLE device: d3:07:42:2d:15:67 (RSSI: -77), not connecting
2021-02-16 13:10:11.314730 (BLE2MQTT-AAB8.lan): I (164943) BLE2MQTT: Discovered BLE device: 76:91:e2:78:50:64 (RSSI: -77), not connecting
2021-02-16 13:10:28.829815 (BLE2MQTT-AAB8.lan): I (182463) BLE2MQTT: Discovered BLE device: 43:76:be:65:a6:3d (RSSI: -93), not connecting
2021-02-16 13:11:17.810535 (BLE2MQTT-AAB8.lan): W (231443) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003b
2021-02-16 13:11:17.869709 (BLE2MQTT-AAB8.lan): I (231453) BLE: Connection closed, reason = 0x3b
2021-02-16 13:11:17.872998 (BLE2MQTT-AAB8.lan): I (231453) BLE2MQTT: Disconnected from device: de:66:92:69:03:8c
2021-02-16 13:11:18.998874 (BLE2MQTT-AAB8.lan): I (232623) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -80), connecting
2021-02-16 13:11:19.224081 (BLE2MQTT-AAB8.lan): W (232853) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-16 13:11:19.227191 (BLE2MQTT-AAB8.lan): E (232863) BLE: Open failed, status = 0x85
2021-02-16 13:11:19.644922 (BLE2MQTT-AAB8.lan): I (233203) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -81), connecting
2021-02-16 13:11:19.698628 (BLE2MQTT-AAB8.lan): W (233333) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-16 13:11:19.702613 (BLE2MQTT-AAB8.lan): E (233333) BLE: Open failed, status = 0x85
2021-02-16 13:11:19.818136 (BLE2MQTT-AAB8.lan): I (233433) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -81), connecting
2021-02-16 13:11:19.938430 (BLE2MQTT-AAB8.lan): W (233563) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-16 13:11:19.939873 (BLE2MQTT-AAB8.lan): E (233573) BLE: Open failed, status = 0x85
2021-02-16 13:11:20.736586 (BLE2MQTT-AAB8.lan): I (234313) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -81), connecting
2021-02-16 13:11:20.842905 (BLE2MQTT-AAB8.lan): W (234443) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-16 13:11:20.844881 (BLE2MQTT-AAB8.lan): E (234443) BLE: Open failed, status = 0x85
2021-02-16 13:11:21.970435 (BLE2MQTT-AAB8.lan): I (235593) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -82), connecting
2021-02-16 13:11:22.097651 (BLE2MQTT-AAB8.lan): W (235723) BT_APPL: bta_gattc_conn_cback() - cif=1 connected=0 conn_id=1 reason=0x003e
2021-02-16 13:11:22.108531 (BLE2MQTT-AAB8.lan): E (235723) BLE: Open failed, status = 0x85
2021-02-16 13:11:22.389857 (BLE2MQTT-AAB8.lan): I (235983) BLE2MQTT: Discovered BLE device: de:66:92:69:03:8c (RSSI: -83), connecting
2021-02-16 13:11:31.329354 (BLE2MQTT-AAB8.lan): I (244873) BLE2MQTT: Connected to device: de:66:92:69:03:8c, scanning
2021-02-16 13:11:32.000653 (BLE2MQTT-AAB8.lan): I (245593) BLE2MQTT: Publishing: de:66:92:69:03:8c/BatteryService/BatteryLevel = 90
2021-02-16 13:11:32.621318 (BLE2MQTT-AAB8.lan): I (246233) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001525-b87f-490c-92cb-11ba5ea5167c = 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2021-02-16 13:11:33.645797 (BLE2MQTT-AAB8.lan): I (247273) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001526-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-16 13:11:33.882063 (BLE2MQTT-AAB8.lan): I (247513) BLE2MQTT: Publishing: de:66:92:69:03:8c/00001861-b87f-490c-92cb-11ba5ea5167c/00001530-b87f-490c-92cb-11ba5ea5167c = 0
2021-02-16 13:11:41.536754 (BLE2MQTT-AAB8.lan): I (255163) BLE2MQTT: Discovered BLE device: 44:5c:ba:c4:80:82 (RSSI: -86), not connecting

dullage avatar Feb 16 '21 13:02 dullage

Did the ESP32 become unresponsive with the above log? I see you're using remote logging and not serial, so it's hard to see if anything went wrong. What I do see is that connections to the shades often fail and it also disconnects after a while, but I'm not sure if there's anything I can do about that as it's handled by the BLE library by Espressif.

@ratsept, Is there any logic in your devices to disconnect automatically, or does your hub maintain a constant connection?

shmuelzon avatar Feb 17 '21 07:02 shmuelzon

Our own hub only connects when something needs to be communicated. We try to minimize time spent in a connection because that blocks other connections for our device. We haven't implemented multiple concurrent connections for this device. But our own app for example does stay connected as long as the device is open in the app. We do have some timeouts on the device side to initiate disconnect but those are definitely longer than the few minutes this log covers. I'm not sure why so many connection attempts are failing here. I do have an ESP32 somewhere so I will try to find it and hopefully I can try the same project myself as well. I've talked to @Dullage by email and I hope he can walk me through getting this up and running as he has it right now. I can then keep the log open on the shade as well to see who is causing these issues.

@shmuelzon is the Espressif BLE library closed source or can we look at what it is doing internally if we want to?

ratsept avatar Feb 17 '21 08:02 ratsept

I do have an ESP32 somewhere so I will try to find it and hopefully I can try the same project myself as well

That would be great! Thank you for the help! I can also try to help you set up.

Is the Espressif BLE library closed source or can we look at what it is doing internally if we want to?

The application layer is based on Android's Bluedroid but the lower lever parts come as binary libraries.

shmuelzon avatar Feb 17 '21 10:02 shmuelzon

@shmuelzon - In the previous log, the ESP32 didn't completely lock up i.e. it was able to recover itself but until it did I was unable to issue commands to the blinds.

I've been capturing serial logs for a couple of days now and last night it did completely lock up i.e. it stopped logging and didn't recover. You can see the full log here. I'm not sure if it's the same thing that happens each time so, after rebooting, I left the device logging to my laptop so if it happens again I'll post the new log and we can compare.

@ratsept - Happy to help if you want to give this setup a try.

dullage avatar Feb 20 '21 10:02 dullage

It locked up again, here's the log. Similar to last time but there are some differences. Both look like the wifi disconnects and never recovers.

dullage avatar Feb 20 '21 21:02 dullage

One more log for the record. I can imagine that there's a good chance the wifi lockup part of this issue is nothing to do with this repo, if you think that's the case then I'll stop raising it here.

dullage avatar Feb 21 '21 09:02 dullage

Thanks for the logs, they all seem very similar with WiFi connection issues but something seems off to me. We get notified that we're disconnected from the network, in which case we stop MQTT, stop BLE scanning and disconnect from all BLE peripherals. Then we get a notification from the MQTT library that we disconnected, even though we shouldn't get that notification if we initiated the disconnection. At least that's how it used to be. Then we try to reconnect to WiFi and only then do we disconnect from the BLE peripheral.

Perhaps there's some issue here causing a deadlock in the WiFi or BLE stacks? First, maybe the below is redundant now and can be removed:

diff --git a/main/ble2mqtt.c b/main/ble2mqtt.c
index 583e828..4ad4ba3 100644
--- a/main/ble2mqtt.c
+++ b/main/ble2mqtt.c
@@ -188,8 +188,6 @@ static void network_on_disconnected(void)
     log_stop();
     ESP_LOGI(TAG, "Disconnected from the network, stopping MQTT");
     mqtt_disconnect();
-    /* We don't get notified when manually stopping MQTT */
-    cleanup();
 }

 /* MQTT callback functions */

Regardless, perhaps we need to wait until all BLE peripherals have disconnected and MQTT is officially disconnected as well before we try to reconnect to WiFi? Maybe it will avoid whatever's locking up...

shmuelzon avatar Feb 23 '21 20:02 shmuelzon

I got the code working on a NodeMCU (I think that's what it's called - an ESP-WROOM-32 module on a carrier board with USB). In our office with no whitelist it locks up in a few seconds. Probably due to the number of BLE devices active. With a single BLE device whitelisted it seemed to work but was still restarting occasionally.

I now brought the module home and I will try to set it up here with Home Assistant making requests to it via MQTT to log it's performance over time.

Is there a hard limit to the number of BLE devices this should work with? I can't really tell from the logs but it looks like it may keep connections to devices open. Usually there is a limit to the number of open connections a BLE central device can have.

ratsept avatar Feb 25 '21 07:02 ratsept

I tried it at home now with no whitelist as the number of BLE devices here is more reasonable. But I'm still getting very frequent reboots with lots of errors and warnings. I did use the newest 4.2 IDF but changing this doesn't seem to make much difference for me.

Here's a log of what happens right now.

ratsept avatar Feb 25 '21 12:02 ratsept

Hey @ratsept,

Is there a hard limit to the number of BLE devices this should work with?

The stack is limited to 9 connections, but it doesn't mean we have enough memory to hold that many connections. I don't put any hard limits myself, but perhaps I should.

I tried it at home now with no whitelist

Could you please add a single device to the whitelist, e.g. fd:ca:55:f7:12:af? I want to try and focus on your devices for now and avoid clutter from others.

I did use the newest 4.2 IDF

I didn't try v4.2 yet and suggest using 4.1 for now.

Thanks again for your help thus far!

shmuelzon avatar Feb 26 '21 09:02 shmuelzon

Hey @ratsept,

Any chance you can try again with the master branch and ESP-IDF v4.3? If possible, could you please also whitelist only one of the BLE shades? I want to try and separate issues with this specific peripheral as opposed to generally handling many peripherals simultaneously.

Thanks in advance!

shmuelzon avatar Jul 17 '21 13:07 shmuelzon

Hi,

Sorry for taking so long to get back to this issue. I will now be actively working on this for at least a while so I updated the SDK to the latest (ESP-IDF v4.3.1) and got the latest version of this project as well. I have whitelisted only one Shade and it still seems to fail exactly like before.

My application will actually require the ESP to only open the BLE connection intermittently every time a read is requested. So at some point I will have to fork and modify this code a lot to do exactly what I need. But for now I think I can also work on this issue with no changes to the code. I can see this crash happen pretty much randomly and it always seems to happen when doing the initial read of all the characteristics. I haven't seen this happen when I call the reads one by one later. I have a feeling that once I drop the initial read of all values and stop keeping the connection open this issue will disappear for me but the underlying problem will most likely remain in the code.

crash_1.log

ratsept avatar Oct 11 '21 08:10 ratsept