Reconnect WiFi (scan for strongest AP)
Describe the problem you have/What new integration you would like
I would like to have a function that disconnects the WiFi and then performs a new scan and connects to the strongest AP found.
Please describe your use case for this integration and alternatives you've tried:
I have an ESP32 that rides along my robot lawnmower and keeps track of some things. I have 3 APs to be able to cover my house + garden. But the ESP is "sticky" and stays connected to which ever AP it decides on first. I know it is possible to set the "reboot_timeout" to something quite low, but it seems unneccesary to reboot the whole ESP and loose track internally of stuff, when I only really want to try a reconnect to a better positioned AP.
Additional context
But it would not be the cleanest solution. Probably more like a function for checking RSSI and if it will be lower than threshold, then it can initiate scan and reconnect to a stronger one. For the proper implementation of roaming we have been waiting for years... https://esp32.com/viewtopic.php?t=3885 / https://github.com/espressif/esp-idf/issues/3671
It'd be nice if the ESP-IDF would support 802.11r (fast roaming) but they do not, as far as I know. Without fast roaming you have to go through the whole disassociate/scan/reassociate process. @randybb linked to the open esp-idf issue tracking fast roaming support.
If they're just now working on adding it to ESP32 I can't imagine we'll ever see it for esp8266.
Agree. Proper roaming would for sure be the best. But as that does not seem to be happening any time soon, this is a "something" (hopefully more feasible) rather than "nothing", in the meantime. Even with proper, powerful devices and really good roaming implementation, wifi is wifi and connection will be lost or interrupted now and then. And communication IS interrupted while scanning for other APs, since there only one single radio chip and antenna. No physical way around that. And the more channels to scan through, the longer it takes. But with the right implementation, a scan could be really quick any way (scan 1ch,talk again for a bit, scan next ch, talk again, aso...). But an ESP does have limited capabilities. Good WiFi applications should always assume that wifi can be dropped, and buffer or resend data, but that can also be tricky to implement. For me, the level of improvement, a hard "reconnect", would at least still give me better control (than a full reboot).
I would like something that would periodically rescan too.
I have several ESPHome smart plugs (S31) and 3 WiFi APs thru my house to provide sufficient coverage. Sometimes I will have to reboot an AP (for firmware updates, reconfigure channel/security/add-VLAN/etc, troubleshooting, power-failure not all are on UPSs) and then they will connect back to whichever happens to come up first. In some cases, this could be connecting to the far end of the house and latch on forever.
I have tried setting multiple networks with the BSSID set for priority to the expected nearest one but this still doesn't work well if the AP boots up slower than the S31 (which is most times). It would be much better if it could properly support roaming in some way.
Using a full reboot just to scan for WiFi is very annoying if it's connected to a light and worse if it's a TV, Radio, or other device that can not tolerate a brief blip without doing a full "reboot cycle" and probably not good for the relay contacts if it's a higher current load such as a washing-machine, dishwasher, etc.
An additional problem, it seems to pick the first AP it sees (by lowest channel number?) out of them to try and connect to regardless of signal strength as the plugs boot up. This complicates the issue farther...my 2-networks specifying BSSID with higher priority helps, but is still a poor workaround. Proper periodic scanning and selecting by signal periodically would correct this problem too.
Vote for this too, I'd like to have any such functionality as well!
Remember to upvote FR using the 👍 on OP
I mean the current priority system already does do that, no?
If you have multiple matching networks with the same priority, they will automatically be chosen in a round-robin fashion because on each disconnect the previous network gets a -1 priority penalty.
If a device connects to a wrong network, it will stay there as long as it's still connected. But as soon as the wifi connection drops it will automatically choose the best network again.
I mean the current
prioritysystem already does do that, no?
No, it does not.
Critical distinction, "strongest AP" is NOT the same as "strongest network". You can have may APs (BSSIDs) with one network (SSID). This is increasingly common with mesh networks and prosumer infrastructure.
If you have multiple matching networks with the same priority, they will automatically be chosen in a round-robin fashion because on each disconnect the previous network gets a
-1priority penalty.
This doesn't help when they are the same network, but multiple access points (same SSID, many BSSID to provide redundancy, distribute load, and improved coverage) and it locks onto a weaker BSSID when a stronger one is available and it won't ever switch to the stronger one. In my experience, on bootup the ESP also frequently does not pick the strongest BSSID for the specified SSID either (seems to go by lowest channel number not signal for which BSSID on a given SSID???)
If a device connects to a wrong network, it will stay there as long as it's still connected. But as soon as the wifi connection drops it will automatically choose the best network again.
That is part of problem, especially with multiple BSSIDs on the same SSID. It should periodically scan (or when it gets below a threshold and reconnect to the strongest one. If it doesn't initially pick the strongest BSSID to begin with, it never tries again either.
There is already a way to have it report back it's signal-strength, there should be a way to have that drop below a threshold and trigger a scan/reconnect without rebooting (which interrupts whatever it's doing, for example cycling the relay on a smart-plug making your TV/radio/whatever reboot too). At least then it would be possible to design something to make it retry periodically with an automation without having to reboot.
Additionally, if it is insisting on using an AP with a poor signal (say at the far end of your house as I often observe) and causing many re-transmitted packets due to low (but not enough to disconnect) signal quality, that introduces significant performance overhead and impact not only to the ESP device but all other devices on the same access point.
I'm suffering this now. Had anyone dig into required changes already?
Tasmota fixed this a couple of years ago, but it is disabled by default... (SetOption56/SetOption57, see https://github.com/arendst/Tasmota/issues/3173). Absolutely needed for ESPHome, without it the devices are nearly guaranteed to do the wrong thing on multi-AP networks. That is causing me big headaches currently.
Ah, workaround for static setups (not the riding lawnmower mentioned above): define the bssid in the WiFi settings. Not as elegant as an automatic scan though, plus it requires one to guess in edge cases.
Sorry, that was nonsense. The BSSID steers the network, not the specific AP.
I expect the bssid thing to work, however, it will stick to one AP will test it
Sorry, that was nonsense. The BSSID steers the network, not the specific AP.
BSSID is what steers the AP (it's the wireless radio's MAC address) the SSID is the network.
Workaround for non-moving is to set the BSSID in network settings BUT if the access point goes offline you still need it to scan for any SSID as a backup (lower priority network). Then I had to make switches, sensors, and automations to reboot the ESPs any time an AP goes offline to force them to re-scan for the one they are supposed to be on.
Anyway what otro said should work. So this is a bug and not a feature request. Just putting it clear. Cannot debut now. No time and my problematic nodes are hard to serial debug for now.
as @randybb mentioned the most ideal would be typical roaming implementation, but if that's too much of a heavy lift and we are waiting for upstream.
-- 1 -- Perhaps adding a setting for minimum RSSI. AP sees multiple BSSIDs (same SSID).
For example one of mine is currently connected to an AP with RSSI -90db, while there is an AP far closer with -65db. If I was able to set a minimum, then the -90db would always be ignored during scan (unless it is the only one available and log if so).
That would likely be easier than the avg user determining the BSSID.
-- 2 -- Second thought is if the ESP is not selecting the strongest BSSID during initial SCAN... I would hope that's something that can be rectified. I have about 30 of these now, each client that has a weak connection (so lower data rates) slows the network for all clients on that channel...
If that's really the case (esp doesn't select BSSID based on signal) we should really add an alert/note in the wifi component documentation to warn users with multi-AP networks.
-- 3 -- Third: An action to wifi.rescan that can be called by on_value from the signal strength sensor.
Again the most ideal is a traditional scan/thresholds and proper roaming support.
Running into this issue also. Did the initial flash in the house, then moved esp32 to garage where there is enough signal to see the original AP, but the connection is very unreliable. Even when the esp32 completely falls off the network, it won't try to connect to the AP that's in the garage because it can still see the one inside. I had to take my laptop to the garage and do the initial programming on a different esp32 in the garage to get it to use the AP in there.
Forcing it to a specific BSSID will not resolve the issue because if that AP ever fails and it tries to connect to one of the APs furthest away, I'll be right back in the same situation.
esp8266 have a bit better WiFi reception with PCB antennas than esp32. In house I don't have problems - I have an AP for every 50 m2, but outside it is another story. I have been using ESP32 with external antennas (ESP32-WROOM-32U) without any problems where ESP32 with PCB antennas would have problems.
as @randybb mentioned the most ideal would be typical roaming implementation, but if that's too much of a heavy lift and we are waiting for upstream.
-- 1 -- Perhaps adding a setting for minimum RSSI. AP sees multiple BSSIDs (same SSID).
For example one of mine is currently connected to an AP with RSSI -90db, while there is an AP far closer with -65db. If I was able to set a minimum, then the -90db would always be ignored during scan (unless it is the only one available and log if so).
That would likely be easier than the avg user determining the BSSID.
-- 2 -- Second thought is if the ESP is not selecting the strongest BSSID during initial SCAN... I would hope that's something that can be rectified. I have about 30 of these now, each client that has a weak connection (so lower data rates) slows the network for all clients on that channel...
If that's really the case (esp doesn't select BSSID based on signal) we should really add an alert/note in the wifi component documentation to warn users with multi-AP networks.
-- 3 -- Third: An action to wifi.rescan that can be called by on_value from the signal strength sensor.
Again the most ideal is a traditional scan/thresholds and proper roaming support.
I have observed #2 with my Sonoff S31 plugs on ESPHome...they go by the lowest channel number they see regardless of signal strength...which in some cases means picking the worst signal.
Really hope something can be figured out better than the mess of improvised BSSID and manually rebooting when it "might" be on a different AP...
I have the same problem - 3 AP in mesh configuration & 1 AP from internet provider. The mesh does a periodic reset every night, so the ESP devices connects to the internet provider AP. Which is a much less stable connection. The Priority system complicates things further - an AP that undergoes a reset cycle gets its priority lowered... So even if on the first day the unit is connected to the best AP, on the next day it won't return to it. Only after a few connect/disconnect cycles the best AP can be selected again. The nodes will connect to every available AP before returning to the first one.
I tried to implement a background scan on the ESP8266 - as mentioned above it interrupts the connection. Each channel takes about 100mSec to scan, so I thought that by scanning one channel every 10sec I'll get a reasonable compromise. Small delay for events to be sent and a full scan every few minutes (14 channels, 10sec/channel = 140sec for full scan, 100mSec/10sec = 1% of event delay by up to 100mSec).
I thought that the TCP/IP stack will handle a small interrupt in the link - but the system became unstable. The small interruption in the WiFi link, caused the TCP/IP link to disconnect - probably because of the HW reporting some problem when a operation was cancelled or refused during the scan. Each disconnect leads to heap allocation, the stack waits about 10 minutes before releasing the old connection memory - which according to TCP/IP docs is a feature and not a bug, but for ESP8266 is catastrophic, because the heap is too small for this.
The same happens if a full scan is initiated too quickly, the current implementation supports re-scanning, but it must be used sparsely.
Why does the API uses TCP/IP? why not use UDP? It uses less resources, and the API messages are small, so a data-gram event driven approach is simpler than a streaming approach used by TCP/IP. Use periodic PING to verify that the connection is alive, and ACK to make sure the data is received (if required by the sensor).
UDP would complicate other things, it is not a reliable protocol ("best effort" is often reliable enough in a quiet network, but if the packet is lost it will never retry...would need to rewrite applications to keep retrying and checking if it worked...much more effort). If it was controlling a light show with a nonstop stream of commands, UDP would be better but for simple on off and button press TCP is the reasonable option.
I'm curious what "unstable" meant - maybe there are some timeouts or buffers that can be adjusted somewhere? I would have thought it would "look" like high latency (which should be fine, and can naturally exist).
I wouldn't even be that upset if it couldn't do full-on roaming if there was a way to do a scheduled reboot say middle of the night and it properly found the best AP...but as it currently stands they seem to prefer the first by channel # rather than signal strength.
See https://github.com/esp8266/Arduino/issues/4213 or http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html ~150bytes are not a lot, but when heap is small, and gets fragmented and eventually may lead to a malloc fail.
I might be wrong about why the heap is running out - but I got a strong correlation between network errors and heap running out. It sometimes comes back, if no network error occurs for a long time - so it isn't a normal memory leak
Seems like a super reasonable desired feature, but with some technical limitations. Perhaps attempting it via a rescan, only available as an action that can be called at first?
As mentioned before, this has been solved by tasmota years ago. I am running 22 ESP8266 and 4 Mesh-APs successfully with tasmota and I am looking into migrating to ESPhome. But without some working "roaming" support it will definitely not work, because I know the troubles I had with tasmota until the "poor man's wifi roaming" was implemented.
It is probably not a very clean solution, but it works rock stable for years. All ESPs are always connected to the best AP, after AP reboot they switch to the next best AP, and after a fixed timespan (22 minutes iirc) they move back to the original AP if the RSSI is significantly better.
I was checking the signal strength of some of my devices and they were not connected to the nearest AP even though the signal was poor, I assumed they would reconnect to the best AP every 5 mins like Tasmota does but then discovered this request!
Even a few reboots did not seem to work for me and ESPHome still connected to the distant AP even though there was a much closer one, I had to enter in the config fast_connect: off which should be the default I believe but now its connected to the nearest AP.
Will be good to get this feature in ESPHome, I was migrating from Tasmota for a few devices but need to pause that now...
Any news on this? Anyone with a workaround or something?
For a workaround, using API callable services, you can try something like this:
api:
services:
- service: scan_wifi
then:
- lambda: |-
wifi::global_wifi_component->start_scanning();
- service: scan_reset
then:
- lambda: |-
// Reset old priorities for known networks
for (auto &scan : wifi::global_wifi_component->get_scan_result()) {
if (wifi::global_wifi_component->has_sta_priority(scan.get_bssid())) {
wifi::global_wifi_component->set_sta_priority(scan.get_bssid(), 0);
}
}
The priority of a network drops every time it disconnects from the AP. In my setup, the mesh routers reset every night (don't ask), so the priority score is useless and just causes the unit to choose the wrong AP.
I've tried multiple ways of calling start_scanning periodically to keep track of the AP with the strongest RSSI, but sometimes the scan causes the link to Home Assistant to disconnect. And sometimes even causes ESPHome reboot.
For now I've added the following patch to handle my the nightly AP resets.
time:
- platform: homeassistant
id: homeassistant_time
on_time:
# Every 30 minutes, at early morning
- seconds: 0
minutes: /30
hours: 4-6
then:
- lambda: |-
if (wifi::global_wifi_component->wifi_rssi() < -60) {
// PATCH: Reset old priorities for known networks
for (auto &scan : wifi::global_wifi_component->get_scan_result()) {
if (wifi::global_wifi_component->has_sta_priority(scan.get_bssid())) {
wifi::global_wifi_component->set_sta_priority(scan.get_bssid(), 0);
}
}
// Rescan
wifi::global_wifi_component->start_scanning();
}
It looks like 802.11k & v are now supported, and I believe this would help.
Does the linked sample code aid in integrating into ESPHome?
https://github.com/esphome/esphome/pull/3600
👍