esp32-ble2mqtt icon indicating copy to clipboard operation
esp32-ble2mqtt copied to clipboard

Add periodic reboot/reboot on wifi disconnection option.

Open Yersi88 opened this issue 4 years ago • 17 comments

Is your feature request related to a problem? Please describe. Device goes offline randomy after at least few hours. Disconnects from MQTT and can't load web interface too. Solution is to power cycle the device.

Describe the solution you'd like Option to have device restarted at periodic intervals. And/or a device to reboot after it has disconnected from the network.

Describe alternatives you've considered N/A

Additional context N/A

Yersi88 avatar Mar 21 '22 11:03 Yersi88

I also have problems with my ESP32 disconnecting and not coming online again. The reboot would be a welcome workaround but an actual solution would also be appreciated.

giddyhup avatar Mar 25 '22 11:03 giddyhup

Hey,

Do either of you have any serial logs to share when the device becomes unresponsive? Preferably with debug logs enabled. It should give some insight as to what's going on.

Thanks!

shmuelzon avatar Mar 25 '22 11:03 shmuelzon

Thanks for noticing this. I have only remote logging set up. The BLE device I am monitoring is very chatty. I use the default bin and see lots of Bluetooth message but nothing else.

giddyhup avatar Mar 25 '22 13:03 giddyhup

@shmuelzon perhaps you could provide a bin where less/no bluetooth messages are being logged. I would try to catch the ESP32 module's log output on a Raspberry Pi or an ESP8266. From what my research shows the ESP32 not reconnecting seems to be a more common issue.

giddyhup avatar Apr 07 '22 18:04 giddyhup

Hey,

@giddyhup Please find attached binaries with some log messages removed. One's with the default INFO level debugs while the other is DEBUG. I've removed some of my debug messages but you'll probably still see some from the other modules.

ble2mqtt.debug.bin.gz ble2mqtt.info.bin.gz

I'm interested in what you've found regarding the ESP32 not reconnecting. Anything concrete for the root cause or any workarounds?

shmuelzon avatar Apr 08 '22 05:04 shmuelzon

@shmuelzon thanks for the binaries. It may take some time for me to work with them but I will report back.

Regarding re-connection issues. I noticed this comment in wifi.c:

This is a workaround as ESP32 WiFi libs don't currently auto-reassociate.

Googling the text shows other people dealing with it

giddyhup avatar Apr 08 '22 07:04 giddyhup

I did first what could be done without much infrastructure change. I remote-flashed the DEBUG image and collected the log remotely. The flash procedure ended with a warm reboot. I used remote logging and after about a minute encountered a disconnect. In the past I experienced disconnects only about once in 12 hours so I was either 'lucky' or something was triggered (wild idea: could it be DHCP renewal?).

The log showed no abnormalities. The last entry is the discovery of a BLE device that the module did not connect to (I only have one address whitelisted). After a forced manual reset via the module's reset button the messages appearing in the remote log still show no abnormalities. There is just some MQTT connection and subscribe chatter and listing of BLE devices.

As I feared, I really have to find a solution for direct logging from a Raspberry Pi to get what is going on when the module disconnects.

giddyhup avatar Apr 08 '22 09:04 giddyhup

Unfortunately, you're right. Debugging WiFi issues can't be done over WiFi. This is a reoccurring problem and some ESP-IDF versions seem more stable than others. To the best of my knowledge I'm not doing anything special with the way I handle WiFi but I know that using it in conjunction with Bluetooth causes issues. I have other projects that use the exact same WiFi code (but don't include Bluetooth) that remain connected for weeks without issues... I've considered trying to switch from Bluedroid to nimBLE but I don't know if it will make any difference since both use the same PHY.

shmuelzon avatar Apr 08 '22 10:04 shmuelzon

Please find attached a redacted excerpt where I de-authed the module on purpose to see what happens.

A panic happened, the module rebooted and reconnected. So here, the result at least was desirable. I had remote logging still enabled but disabled it now for the next tests. esp32ble.log

giddyhup avatar Apr 09 '22 11:04 giddyhup

Thanks for that. It looks like we've run or of memory (?) But, it might be related to the fact that debug logs are enabled, it takes its toll...

shmuelzon avatar Apr 09 '22 13:04 shmuelzon

Apparently serial logging fixes the reconnection issues. : )

The module is online since I started logging. I'll keep logging.

giddyhup avatar Apr 10 '22 13:04 giddyhup

Apparently serial logging fixes the reconnection issues. : )

The module is online since I started logging. I'll keep logging.

How do I enable this accidental fix?

Yersi88 avatar Apr 10 '22 13:04 Yersi88

I was joking, but if you want to try it yourself here's what I did:

I flashed (OTA) ble2mqtt.debug.bin.gz and connected my ESP32 module via its USB port to a Raspberry Pi. On the Pi I use minicom to collect the module's serial output and write it to a file.

giddyhup avatar Apr 10 '22 13:04 giddyhup

BTW - Not sure if relevant to your setup, I once had an ESP32 connected to an RPI2 (I think) and it would often fail and get stuck in a boot loop but when I moved it to another machine (with USB3) it didn't happen again. I'm guessing it might be related to the amount of current the USB was able to provide.

shmuelzon avatar Apr 10 '22 14:04 shmuelzon

@shmuelzon, thanks for pointing that out.

Initially, my ESP32 was connected to a USB charging brick that provides 5V/1A or more. Currently, it is connected to a Pi 4's USB 2 port. So I guess it is currently getting less juice while before (where it had issues reconnecting). Perhaps later will connect it to the charger again and see if failing reconnects reoccur.

giddyhup avatar Apr 10 '22 14:04 giddyhup

@shmuelzon I was offline for a few days and am now back to continue with this issue.

I downgraded to the official version (16) and can confirm that the issue returns. In the log these entries keep repeating:


I (87334265) WiFi: Disconnected
I (87334265) BLE2MQTT: Disconnected from the network, stopping MQTT
I (87334265) MQTT: Disconnecting MQTT client
W (87334265) wifi:m f probe req l=0

W (87334395) wifi:m f probe req l=0

W (87334515) wifi:m f probe req l=0

W (87334635) wifi:m f probe req l=0

W (87334755) wifi:m f probe req l=0

W (87334875) wifi:m f probe req l=0

W (87334995) wifi:m f probe req l=0

W (87335125) wifi:m f probe req l=0

W (87335245) wifi:m f probe req l=0

W (87335365) wifi:m f probe req l=0

W (87335485) wifi:m f probe req l=0

I (87335845) WiFi: Disconnected
I (87335845) BLE2MQTT: Disconnected from the network, stopping MQTT
I (87335845) MQTT: Disconnecting MQTT client
W (87335845) wifi:m f probe req l=0

giddyhup avatar Apr 25 '22 07:04 giddyhup

I am using the debug version again and a few minutes ago the BT connection to the (single whitelisted) device I am listening to got disconnected. Initially, when I checked the web interface the request appeared in the log but during handling the web request the module disconnected:

D (12974347) httpd: httpd_server: processing listen socket 54
D (12974347) httpd: httpd_accept_conn: newfd = 57
D (12974347) httpd_sess: httpd_sess_new: fd = 57
D (12974347) httpd_sess: httpd_sess_new: active sockets: 1
D (12974357) httpd: httpd_accept_conn: complete
D (12974357) httpd: httpd_server: doing select maxfd+1 = 58
D (12974367) httpd: httpd_process_session: processing socket 57
D (12974367) httpd_sess: httpd_sess_process: httpd_req_new
D (12974377) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (12974387) httpd_txrx: httpd_recv_with_opt: received length = 128
D (12974387) httpd_parse: read_block: received HTTP request block size = 128
D (12974397) httpd_parse: cb_url: message begin
D (12974397) httpd_parse: cb_url: processing url = /
D (12974407) httpd_parse: verify_url: received URI = /
D (12974407) httpd_parse: cb_header_field: headers begin
D (12974417) httpd_txrx: httpd_unrecv: length = 112
D (12974417) httpd_parse: pause_parsing: paused
D (12974427) httpd_parse: cb_header_field: processing field = Host
D (12974437) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (12974437) httpd_txrx: httpd_recv_with_opt: pending length = 112
D (12974447) httpd_parse: read_block: received HTTP request block size = 112
D (12974457) httpd_parse: continue_parsing: skip pre-parsed data of size = 5
D (12974457) httpd_parse: continue_parsing: un-paused
D (12974467) httpd_parse: cb_header_value: processing value = ble2mqtt-d4ac.lan
D (12974477) httpd_parse: cb_header_field: processing field = User-Agent
D (12974477) httpd_parse: cb_header_value: processing value = Mozilla/5.0 (Wind9
D (12974487) httpd_parse: parse_block: parsed block size = 112
D (12974497) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (12974507) httpd_txrx: httpd_recv_with_opt: received length = 128
D (12974507) httpd_parse: read_block: received HTTP request block size = 128
D (12974517) httpd_parse: cb_header_value: processing value = 9.0
D (12974527) httpd_parse: cb_header_field: processing field = Accept
D (12974527) httpd_parse: cb_header_value: processing value = text/html,applica8
D (12974547) httpd_parse: cb_header_field: processing field = Accept-Language
D (12974547) httpd_parse: cb_header_value: processing value = en
D (12974557) httpd_parse: cb_header_field: processing field = Accept-
D (12974567) httpd_parse: parse_block: parsed block size = 240
D (12974567) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (12974577) httpd_txrx: httpd_recv_with_opt: received length = 101
D (12974587) httpd_parse: read_block: received HTTP request block size = 101
D (12974587) httpd_parse: cb_header_field: processing field = Encoding
D (12974597) httpd_parse: cb_header_value: processing value = gzip, deflate
D (12974607) httpd_parse: cb_header_field: processing field = DNT
D (12974607) httpd_parse: cb_header_value: processing value = 1
D (12974617) httpd_parse: cb_header_field: processing field = Connection
D (12974617) httpd_parse: cb_header_value: processing value = keep-alive
D (12974627) httpd_parse: cb_header_field: processing field = Upgrade-Insecure-s
D (12974637) httpd_parse: cb_header_value: processing value = 1
D (12974637) httpd_parse: cb_header_field: processing field = Sec-GPC
D (12974647) httpd_parse: cb_header_value: processing value = 1
D (12974657) httpd_parse: cb_headers_complete: bytes read     = 359
D (12974657) httpd_parse: cb_headers_complete: content length = 0
D (12974667) httpd_parse: pause_parsing: paused
D (12974667) httpd_parse: cb_no_body: message complete
D (12974677) httpd_parse: httpd_parse_req: parsing complete
D (12974687) httpd_uri: httpd_uri: request for / with type 1
D (12974687) httpd_uri: httpd_find_uri_handler: [0] = /restart
D (12974697) httpd_uri: httpd_find_uri_handler: [1] = /status
D (12974697) httpd_uri: httpd_find_uri_handler: [2] = /ble/bonding_db
D (12974707) httpd_uri: httpd_find_uri_handler: [3] = /ble/devices
D (12974717) httpd_uri: httpd_find_uri_handler: [4] = /ota/firmware
D (12974717) httpd_uri: httpd_find_uri_handler: [5] = /ota/configuration
D (12974727) httpd_uri: httpd_find_uri_handler: [6] = /fs/*
D (12974727) httpd_uri: httpd_find_uri_handler: [7] = /fs/*
D (12974737) httpd_uri: httpd_find_uri_handler: [8] = /fs/*
D (12974747) httpd_uri: httpd_find_uri_handler: [9] = /
D (12974747) HTTPD: Handling route for /
D (12974747) httpd_txrx: httpd_resp_set_hdr: new header = Content-Encoding: gzip
D (12974757) httpd_txrx: httpd_send_all: sent = 64
D (12974767) httpd_txrx: httpd_send_all: sent = 16
D (12974767) httpd_txrx: httpd_send_all: sent = 2
D (12974777) httpd_txrx: httpd_send_all: sent = 4
D (12974777) httpd_txrx: httpd_send_all: sent = 2
D (12974787) httpd_txrx: httpd_send_all: sent = 2
D (12974787) httpd_txrx: httpd_send_all: sent = 1055
D (12974797) httpd_sess: httpd_sess_process: httpd_req_delete
D (12974797) httpd_sess: httpd_sess_process: success
D (12974807) httpd: httpd_server: doing select maxfd+1 = 58
I (12980727) wifi:bcn_timout,ap_probe_send_start
W (12981737) wifi:m f null

I (12983237) wifi:ap_probe_send over, resett wifi status to disassoc
I (12983237) wifi:state: run -> init (c800)
D (12983237) wifi:connect status 5 -> 3
D (12983237) wifi:send diassoc for beacon timeout(200)
D (12983247) wifi:sta leave
I (12983247) wifi:pm stop, total sleep time: 197398855 us / 233968858 us

D (12983247) wifi:stop CSA timer
W (12983257) wifi:<ba-del>idx
W (12983257) wifi:<ba-del>idx
D (12983257) wifi:remove xx:xx:xx:xx:cd:2e from rc list
I (12983267) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1
D (12983267) wifi:filter: set rx policy=8
D (12983277) wifi:Send disconnect event, reason=200, AP number=0
D (12983277) wifi:connect status 3 -> 0

giddyhup avatar Apr 28 '22 11:04 giddyhup