Arduino-HomeKit-ESP8266 icon indicating copy to clipboard operation
Arduino-HomeKit-ESP8266 copied to clipboard

'No Response' after reboot/ssid_off WiFi router

Open dlangamer opened this issue 3 years ago • 24 comments

Everything goes fine until the network's wifi router goes down/restart. ESP serial reports sucessful reconnection, but devices are no longer able to communicate with the Home app ("No Response" red message). To working again, only restarting esp. Anyone else experiencing this problem?

Below, the result displayed in the serial:

SketchSize: 476624 B
FreeSketchSpace: 1617920 B
FlashChipSize: 4194304 B
FlashChipRealSize: 4194304 B
FlashChipSpeed: 40000000
SdkVersion: 2.2.2-dev(38a443e)
FullVersion: SDK:2.2.2-dev(38a443e)/Core:3.0.1=30001000/lwIP:STABLE-2_1_2_RELEASE/glue:1.2-48-g7421258/BearSSL:c0b69df
CpuFreq: 160MHz
FreeHeap: 46656 B
ResetInfo: External System
ResetReason: External System
OFF
>>> [    100] HomeKit: Starting server
>>> [    104] HomeKit: Using existing accessory ID: EF:B9:5C:77:D4:96
>>> [    110] HomeKit: Found admin pairing with 19052CD2-AA12-4409-A7C0-26A07A1E5030, disabling pair setup
>>> [    119] HomeKit: Configuring MDNS
>>> [    122] HomeKit: Init server over
>>> [   1327] HomeKit: heap: 46552, sockets: 0
>>> [   4841] HomeKit: WiFi connected, ip: 192.168.0.23, mask: 255.255.255.0, gw: 192.168.0.1
>>> [   4850] HomeKit: Configuring MDNS
>>> [   4854] HomeKit: MDNS begin: ESP8266_LED_06EB3A, IP: 192.168.0.23
>>> [   6401] HomeKit: heap: 45120, sockets: 0
>>> [   7611] HomeKit: Got new client: local 192.168.0.23:5556, remote 192.168.0.18:49238
>>> [   7620] HomeKit: [Client 1073680492] Pair Verify Step 1/2
>>> [   7939] HomeKit: Free heap: 42600
>>> [   8144] HomeKit: [Client 1073680492] Pair Verify Step 2/2
>>> [   8151] HomeKit: [Client 1073680492] Found pairing with 19052CD2-AA12-4409-A7C0-26A07A1E5030
>>> [   8174] HomeKit: Call ge_double_scalarmult_vartime_lowmem in ge_low_mem.c
>>> [   8905] HomeKit: [Client 1073680492] Verification successful, secure session established
>>> [   8913] HomeKit: Free heap: 42712
>>> [   9122] HomeKit: [Client 1073680492] Get Accessories
>>> [   9419] HomeKit: [Client 1073680492] Update Characteristics
>>> [  11440] HomeKit: heap: 43280, sockets: 1
>>> [  12851] HomeKit: [Client 1073680492] Get Characteristics
>>> [  13266] HomeKit: [Client 1073680492] Get Characteristics
>>> [  16305] HomeKit: [Client 1073680492] Update Characteristics
ON
>>> [  16516] HomeKit: heap: 42776, sockets: 1
>>> [  17326] HomeKit: [Client 1073680492] Update Characteristics
OFF
>>> [  18342] HomeKit: [Client 1073680492] Update Characteristics
ON
>>> [  18755] HomeKit: [Client 1073680492] Update Characteristics
OFF
>>> [  21577] HomeKit: heap: 42776, sockets: 1
>>> [  26610] HomeKit: heap: 42776, sockets: 1
>>> [  31628] HomeKit: heap: 42776, sockets: 1
>>> [  36646] HomeKit: heap: 42776, sockets: 1
>>> [  41662] HomeKit: [Client 1073680492] Disconnected!      <     WIFI DISCONNECT - NO RESPONSE AFTER THIS POINT
>>> [  41667] HomeKit: [Client 1073680492] Closing client connection
>>> [  41673] HomeKit: heap: 45256, sockets: 0
>>> [  46692] HomeKit: heap: 45280, sockets: 0
>>> [  51713] HomeKit: heap: 45280, sockets: 0
>>> [  56733] HomeKit: heap: 45280, sockets: 0
>>> [  61750] HomeKit: heap: 45280, sockets: 0
>>> [  66766] HomeKit: heap: 44896, sockets: 0
>>> [  69017] HomeKit: WiFi connected, ip: 192.168.0.23, mask: 255.255.255.0, gw: 192.168.0.1
>>> [  69025] HomeKit: Configuring MDNS
>>> [  69032] HomeKit: MDNS restart: ESP8266_LED_06EB3A, IP: 192.168.0.23
>>> [  71788] HomeKit: heap: 45000, sockets: 0
>>> [  76819] HomeKit: heap: 44496, sockets: 0
>>> [  81842] HomeKit: heap: 44704, sockets: 0
>>> [  86863] HomeKit: heap: 44704, sockets: 0
>>> [  91881] HomeKit: heap: 44704, sockets: 0
>>> [  96898] HomeKit: heap: 44704, sockets: 0

dlangamer avatar Jul 19 '21 17:07 dlangamer

I, too, have recently experienced this problem. Originally I thought it was a problem with my code, but it seems like the problem may be elsewhere. I've been thinking about writing a public server_free function, for the following scenarios;

  1. Unload on OTA Start
  2. Unload/Reload on wifi disconnect?
  3. Unload/Reload on Request (via, maybe http, then maybe just ESP.restart() would suffice).

I did want to mention that I have Arduino statsd instrumented as well, the statsd stuff works on reconnect. So the core loop is operating as intended. Just the HomeKit server stuff doesn't respond after a reconnect.

dsbaha avatar Jul 20 '21 16:07 dsbaha

I, too, have recently experienced this problem. Originally I thought it was a problem with my code, but it seems like the problem may be elsewhere. I've been thinking about writing a public server_free function, for the following scenarios;

  1. Unload on OTA Start
  2. Unload/Reload on wifi disconnect?
  3. Unload/Reload on Request (via, maybe http, then maybe just ESP.restart() would suffice).

I did want to mention that I have Arduino statsd instrumented as well, the statsd stuff works on reconnect. So the core loop is operating as intended. Just the HomeKit server stuff doesn't respond after a reconnect.

The problem with using the "ESP.restart()" would be that the device would lose the active status. e.g.: a light bulb via relay would turn off and the user would need to activate it again. Yesterday I tested the option to unload the entire wifi stack and reconnect, but it didn't work either. I believe it's a problem regarding the session/socket break instructions.

I'm studying all the code to try to find a solution, but I'm not an advanced programmer. If anyone has any ideas, help us.

dlangamer avatar Jul 21 '21 16:07 dlangamer

I, too, have recently experienced this problem. Originally I thought it was a problem with my code, but it seems like the problem may be elsewhere. I've been thinking about writing a public server_free function, for the following scenarios;

  1. Unload on OTA Start
  2. Unload/Reload on wifi disconnect?
  3. Unload/Reload on Request (via, maybe http, then maybe just ESP.restart() would suffice).

I did want to mention that I have Arduino statsd instrumented as well, the statsd stuff works on reconnect. So the core loop is operating as intended. Just the HomeKit server stuff doesn't respond after a reconnect.

I found a workaround to solve the reconnection issue. Inside the source code "arduino_homekit_server.cpp", in the function "void arduino_homekit_setup", I created a check if the mDNS service is active. Otherwise, it calls accessory pairing and as a result also starts the mDNS service. Works well for now. I'm checking that there won't be any side effects.

void arduino_homekit_setup(homekit_server_config_t config) { //ESP32 use FreeRTOS-task xTaskCreate(esp32_homekit_task, "HomeKit Server", SERVER_TASK_STACK, config, 1, NULL); / if (system_get_cpu_freq() != SYS_CPU_160MHZ) { system_update_cpu_freq(SYS_CPU_160MHZ); INFO("Update the CPU to run at 160MHz"); }*/

if (homekit_mdns_started = false) {
	homekit_server_init(config);
}

dlangamer avatar Jul 26 '21 19:07 dlangamer

I'm also trying to find a workaround and I would like to try your solution. I'm a bit noob and confused, why did you add this snippet of code

if (homekit_mdns_started = false) { homekit_server_init(config); }

into "void arduino_homekit_setup" as opposed to let's say "void arduino_homekit_loop"? isn't the setup only run once when the esp restarts? Another confusion that I have is that "void arduino_homekit_setup" already runs "homekit_server_init(config)" (at least in the 8266 library) regardless of mDNS status and the issue is still there. Thank you for your time and answers.

tomdmt avatar Sep 05 '21 23:09 tomdmt

+1 has this problem. I have tested 2 WiFi routers with simplest code.

ruleechen avatar Oct 03 '21 13:10 ruleechen

I am linking another relevant issue https://github.com/Mixiaoxiao/Arduino-HomeKit-ESP8266/issues/103

ruleechen avatar Oct 07 '21 14:10 ruleechen

Guys, I have got the magic! Just remove this line.

image

ruleechen avatar Oct 11 '21 14:10 ruleechen

huh, do you have a working theory as to why this works?

Edit: I made two identical devices, one with this change and one without and I'm glad to say that the one with the change did not disconnect a single time over several weeks, whereas the one without the change disconnected multiple times. I'm so happy, I've been looking for a fix to this issue for months and this is the only thing that works.

tomdmt avatar Oct 11 '21 16:10 tomdmt

iOS 15.1, this problem. Did not help..

seenve avatar Oct 20 '21 18:10 seenve

Is this the best solution so far? Any feedback yet? -- I am going to rebuild my devices these days and I had run into that problem in the past. I am really looking for a robust fix too :-)

jenspr avatar Jan 10 '22 11:01 jenspr

I have been changed the code to //MDNS.close(), but the error is the same. Then I put the reboot of ESP to monitor at thingspeak. https://thingspeak.com/channels/1286808/charts/3?bgcolor=%23ffffff&color=%23d62020&days=1&dynamic=false&results=100&title=Reinicialização+do+sistema&type=line&yaxis=boot The problem follow...

mateusmsantin avatar Feb 15 '22 13:02 mateusmsantin

@mateusmsantin

Have a try of my version https://github.com/ruleechen/home-switch/blob/main/extras/arduino_homekit_server.cpp I have some little customizations. Probably helps. Good luck.

ruleechen avatar Feb 15 '22 13:02 ruleechen

Hi! @ruleechen I try replace arduino_homekit_server.cpp but I’m having some error. Can this update work with ArduinoOTA?

Multiple libraries were found for "ArduinoOTA.h" Used: /Users/test/Library/Arduino15/packages/esp8266/hardware/esp8266/3.0.2/libraries/ArduinoOTA Not used: /Users/test/Documents/Arduino/libraries/ArduinoOTA Multiple libraries were found for "arduino_homekit_server.h" Used: /Users/test/Documents/Arduino/libraries/HomeKit_ESP8266-1.2.0 Not used: /Users/test/Documents/Arduino/libraries/HomeKit-ESP8266 /Users/test/Documents/Arduino/libraries/HomeKit_ESP8266-1.2.0/src/arduino_homekit_server.cpp-OLD.cpp:17:10: fatal error: homekit_base64.h: No such file or directory Not used: /Users/test/Documents/Arduino/libraries/ESPHap 17 | #include "homekit_base64.h" | ^~~~~~~~~~~~~~~~~~ compilation terminated.

mateusmsantin avatar Feb 17 '22 00:02 mateusmsantin

@mateusmsantin It seems not the issue related to ArduinoOTA. Please change 'homekit_base64.h' in the file to 'base64.h'. That is a customization of my env.

ruleechen avatar Feb 17 '22 01:02 ruleechen

@ruleechen Great work! It is running very well, about 4 hours no reboot.

mateusmsantin avatar Feb 17 '22 15:02 mateusmsantin

@ruleechen Thanks! it's working, but unable restore previous relay state after power off&on Is there any solution? like when power goes and comes back...it should restore previous state on/off Please help!!!!!!!!

Sanjayc1806 avatar Feb 23 '22 07:02 Sanjayc1806

@Sanjayc1806 Keep state is the functionality beyond this library. I'm afraid we have to do it ourself by saving the on/off state. And recover state base on the saved state after reboot.

ruleechen avatar Feb 23 '22 07:02 ruleechen

@ruleechen yeah, we can do that using EEPROM and it's working fine while using webserver localhost buttons... Should try that with homekit code

thanks for your version of code......it's working great now

Sanjayc1806 avatar Feb 23 '22 07:02 Sanjayc1806

@Sanjayc1806 Welcome. Mine is up for over 10 days now.

ruleechen avatar Feb 23 '22 07:02 ruleechen

Hi @ruleechen, Can you provide me the modified code for ESP32? Thank you in advance!!

ayush9upta avatar Jun 18 '22 18:06 ayush9upta

huh, do you have a working theory as to why this works?

Edit: I made two identical devices, one with this change and one without and I'm glad to say that the one with the change did not disconnect a single time over several weeks, whereas the one without the change disconnected multiple times. I'm so happy, I've been looking for a fix to this issue for months and this is the only thing that works.

I facing the same issue.

After I found this page. I review homekit_mdns_init() in ardunio_homekit_server.cpp and MDNSResponder::close() in LEAmDNS.cpp .

The theory maybe:

  1. MDNS.close() function will release all services added by addService function.
  2. The variable homekit_mdns_started In boot/reboot always false, so will run addService function after // MDNS.close(); line.
  3. The variable homekit_mdns_started always true after reconnect wifi, so no any service active if we call the MDNS.close(); function.
  4. I guess this service using for waiting HA connect.
  5. No service active, HA can't connect to the homekit_server.

kendling avatar Jun 24 '22 08:06 kendling

The theory maybe:

MDNS.close() function will release all services added by addService function. The variable homekit_mdns_started In boot/reboot always false, so will run addService function after // MDNS.close(); line. The variable homekit_mdns_started always true after reconnect wifi, so no any service active if we call the MDNS.close(); function. I guess this service using for waiting HA connect. No service active, HA can't connect to the homekit_server.

I think you might be on to something here. Looks like if WiFi reconnects, we need to re-advertise those services, and we’re simply not. Going to play with just removing the condition, and clearing/adding the service every time mdns_init is called.

Further evidence:

3539   // The MDNS needs to be restarted when WiFi is connected to confirm the       
3540   // MDNS runs at the IPAddress of STA                                          
3541   // otherwise the iOS will not show the Accessory                              

But if, like you point out, the services are actually gone, then they won’t be re-advertised, so HomeKit isn’t aware they exist. My guess is that if we reboot soon enough (due to some watchdog condition, or memory leak), we just reinit, but if we don’t reboot for sometime, HomeKit assumes we’re gone.

paullj1 avatar Feb 25 '23 14:02 paullj1

Okay, so digging deeper into the actual mDNS announcements, I noticed that the device IDs are changing which explains everything. The attempts thus far have been to optimize the mDNS code itself, but this doesn’t solve any problems if the accessory ID changes (which appears to happen randomly)

The problem I’ve been having isn’t that the device gets disconnected, it’s that when the device comes back online, it’s failing to find its ID, and generating a new random one. To the network, it’s a brand new device.

It looks like my problem is that something is colliding with the address space in the EEPROM that this library is using for the ID.

Hope this helps someone else!

paullj1 avatar Feb 25 '23 14:02 paullj1

Okay, so it doesn’t look like I was overlapping. Seems to be an error handling issue in the “compact_data” function. Looks like any time there’s a topology change, HomeKit notifies the device to update its pairing info. This happens often now with the new HomeKit topology changes… the bad news is, if the device reboots, or fails to read from the flash for any reason (preemption, watchdog, etc…) it blanks out the “magic” bytes completely invalidating the config/pairing data and doesn’t fix it causing the boot process to think the device ID is invalid, and generate a new one. HomeKit looks for the old device ID, and can’t find it. The device then advertises itself as a brand new device.

Working on a fix. WIll post as a PR when done.

paullj1 avatar Feb 25 '23 17:02 paullj1