esp-mesh-lite icon indicating copy to clipboard operation
esp-mesh-lite copied to clipboard

mesh communication delayed in rooterless mesh network (AEGHB-277)

Open HWuest opened this issue 2 years ago • 19 comments

If a rooterless mesh network is set up the mesh message communication is disturbed regularily.

Every several seconds communication stops for several seconds (after the pause the pending messages are processed on block).

It seems that a WLAN rooter scan takes place in a regular shedule where during that scanning no mesh communication can take place.

This delay is very critical for our application (real-time communication of deafblind people over a mesh network based input/output device) where fluent communication is essential.

If a rooter is connected it seems that the disturbance does not ocure, so the mesh network in principle is able to fullfill the task.

For our mobile devices a rooterless configuration is needed (which was reported as one of the special features of mesh-lite),

My issue number #13 about how to setup a rooterless networks correctly was not answered till now and the current disturbance omits the start of beta testing of the devices with deafblind people.

A quick short answer is therefore apriciated...

Main question is, how can the interruption of the communication (scaning task, if this is the root cause) be stopped?

Also a work around solution would be helpfull to not further delay live testing (it is nearly impossible to explain this unexpected behaviour to deafblind people)...

HWuest avatar Jul 06 '23 09:07 HWuest

Hi @HWuest , I apologize for the delayed response. I will promptly work on providing you with a sample code to optimize the creation of a rootless mesh network and reduce communication latency.

tswen avatar Jul 11 '23 08:07 tswen

Thank you for your help.

As mentioned in AEGHB-238 I have to call esp_mesh_lite_set_allowed_level(1); on one device (set this device to root) otherwise the other devices will not connect to the mesh-network if no rooter is available.

Is this the correct way to start a rooterless network and do I need to call esp_mesh_lite_set_disallowed_level(1); on the other devices?

HWuest avatar Jul 11 '23 08:07 HWuest

Yes, for a routerless network, it is essential to specify a root node. Otherwise, the devices that power on later will not know whom to connect to, and it can potentially lead to a circular connection, which can cause significant issues.

tswen avatar Jul 11 '23 09:07 tswen

You can use this branch to test the no_router scenario with some internal optimizations that have greatly reduced communication latency.

tswen avatar Jul 12 '23 09:07 tswen

What is the easiest way to exchange the new libraries in my existing Espressif-IDE project?

At the moment I have espressif__mesh_lite in my managed_components folder using lib version libesp_mesh_lite_esp32s2.a :4a6dd88

Should I simply replace the Libraries and version information or is there an automatic way of downloading/switching to the branch in the IDE?

HWuest avatar Jul 12 '23 12:07 HWuest

For a short test I moved mesh_lite to the component folder and replaced the library libesp_mesh_lite_esp32s2.a :4a6dd88 by the new one (libesp_mesh_lite_esp32s2.a :c559758) in my project. I did a rebuild with full project clean and checked that the new library was used. The behaviour didn't change much.

I added some test code to produce a round-robin message sending from client to root and back including a time measurement. (short explenation of test code: Every time a message is recieved by the root node the message is imediatly send back. When a message is recieved by the client a new message to root is send to generate a continous message stream as fast as possible. An initial message from the client to root starts the process. The client send time and the round robin time is measured...

Here the resulting time measurements (time values in ms):

Timing Messages parent to root and back with new lib, with active rooter connection ( same behaviour with old lib): W (40422) DHS: Send Time: 40834 Msg-Delay 10 W (40432) DHS: Send Time: 40844 Msg-Delay 9 W (40442) DHS: Send Time: 40854 Msg-Delay 9 W (40452) DHS: Send Time: 40863 Msg-Delay 9 W (40462) DHS: Send Time: 40872 Msg-Delay 9 W (40472) DHS: Send Time: 40881 Msg-Delay 12 W (40482) DHS: Send Time: 40894 Msg-Delay 10 W (40492) DHS: Send Time: 40905 Msg-Delay 9 W (40502) DHS: Send Time: 40915 Msg-Delay 9 W (40512) DHS: Send Time: 40924 Msg-Delay 9 W (40532) DHS: Send Time: 40934 Msg-Delay 11 W (40532) DHS: Send Time: 40946 Msg-Delay 8 W (40542) DHS: Send Time: 40955 Msg-Delay 9 W (40552) DHS: Send Time: 40964 Msg-Delay 9 W (40562) DHS: Send Time: 40973 Msg-Delay 10 W (40572) DHS: Send Time: 40984 Msg-Delay 9 W (40582) DHS: Send Time: 40993 Msg-Delay 9 W (40592) DHS: Send Time: 41002 Msg-Delay 9 W (40602) DHS: Send Time: 41011 Msg-Delay 9 W (40612) DHS: Send Time: 41021 Msg-Delay 9 W (40622) DHS: Send Time: 41031 Msg-Delay 9 W (40632) DHS: Send Time: 41040 Msg-Delay 10 W (40642) DHS: Send Time: 41051 Msg-Delay 10 W (40652) DHS: Send Time: 41061 Msg-Delay 11

... communication continues for several seconds without data loss (max_retry = 0)....

Timing Messages parent to root and back with old lib, without rooter connection:

W (70114) DHS: Send Time: 16879 Msg-Delay 13 W (70124) DHS: Send Time: 16886 Msg-Delay 14 W (70134) DHS: Send Time: 16893 Msg-Delay 13 W (70134) DHS: Send Time: 16900 Msg-Delay 13 W (70144) DHS: Send Time: 16907 Msg-Delay 14 W (70154) DHS: Send Time: 16914 Msg-Delay 14 W (70154) DHS: Send Time: 16921 Msg-Delay 13 W (70164) DHS: Send Time: 16929 Msg-Delay 13 W (70174) DHS: Send Time: 16935 Msg-Delay 13 W (70174) DHS: Send Time: 16942 Msg-Delay 13 W (70184) DHS: Send Time: 16949 Msg-Delay 13 W (70314) DHS: Send Time: 16956 Msg-Delay 132 W (72084) DHS: Send Time: 17089 Msg-Delay 1772 W (72094) DHS: Send Time: 18861 Msg-Delay 9 W (73624) DHS: Send Time: 19885 Msg-Delay 511 W (73624) DHS: Send Time: 20397 Msg-Delay 9

communication stops because of data loss!

Timing Messages parent to root and back with new lib, without rooter connection:

W (51414) DHS: Send Time: 51745 Msg-Delay 129 W (51424) DHS: Send Time: 51875 Msg-Delay 9 W (51434) DHS: Send Time: 51884 Msg-Delay 9 W (51554) DHS: Send Time: 51894 Msg-Delay 128 W (51574) DHS: Send Time: 52022 Msg-Delay 11 W (51574) DHS: Send Time: 52034 Msg-Delay 9 W (51584) DHS: Send Time: 52043 Msg-Delay 8 W (51714) DHS: Send Time: 52052 Msg-Delay 128 W (51724) DHS: Send Time: 52181 Msg-Delay 9 W (51734) DHS: Send Time: 52190 Msg-Delay 9 W (51864) DHS: Send Time: 52199 Msg-Delay 130 W (51874) DHS: Send Time: 52330 Msg-Delay 10 W (51884) DHS: Send Time: 52340 Msg-Delay 9 W (52014) DHS: Send Time: 52350 Msg-Delay 128 W (52024) DHS: Send Time: 52478 Msg-Delay 8 W (52034) DHS: Send Time: 52487 Msg-Delay 9 W (52044) DHS: Send Time: 52497 Msg-Delay 9 W (52174) DHS: Send Time: 52506 Msg-Delay 128 W (52184) DHS: Send Time: 52635 Msg-Delay 9 W (52194) DHS: Send Time: 52644 Msg-Delay 9 W (52564) DHS: Send Time: 52654 Msg-Delay 372 W (52574) DHS: Send Time: 53026 Msg-Delay 9 W (52584) DHS: Send Time: 53035 Msg-Delay 9 communication stops because of data loss!

Result: With rooter conenction communication nearly stable max round robin time around 10ms,

Without rooter round robin time variing between 9 and up to 370 ms (new lib, old lib above 2 seconds) with sudden complete message losses (max_retry = 0).

With the new lib I could'nt reproduce the very long delays of several seconds so there seems to be an improvement but now the delay increases to above 100ms every several 100 ms which is not the case when a rooter is connected. There seem to be still some bottle neck in the rooterless mode...

Here part of the message sending code I used (to see the payload of the messages): cJSON_AddStringToObject(item, "send", ""); // empty data field cJSON_AddStringToObject(item, "mac", mac); // mac address cJSON_AddNumberToObject(item, "time", (double)timer_u32()); // time at message send cJSON_AddBoolToObject(item, "fromRoot", false); // message from client or root

        esp_mesh_lite_try_sending_msg((char*)"rec", (char*)"rec_ack", 0, item, &esp_mesh_lite_send_msg_to_root);

or esp_mesh_lite_try_sending_msg((char*)"rec", (char*)"rec_ack", 0, item, &esp_mesh_lite_send_broadcast_msg_to_child);

HWuest avatar Jul 12 '23 14:07 HWuest

Addition: With a max_retry of 1 the data loss does not occure but the long delay times above 2 seconds comes back than:

W (331242) DHS: Send Time: 9167 Msg-Delay 370 W (331252) DHS: Send Time: 9175 Msg-Delay 370 W (331252) DHS: Send Time: 9538 Msg-Delay 13 W (331262) DHS: Send Time: 9545 Msg-Delay 13 W (331272) DHS: Send Time: 9552 Msg-Delay 12 W (331632) DHS: Send Time: 9558 Msg-Delay 371 W (332022) DHS: Send Time: 9930 Msg-Delay 389 W (332032) DHS: Send Time: 10320 Msg-Delay 9 W (332042) DHS: Send Time: 10330 Msg-Delay 9 W (332052) DHS: Send Time: 10339 Msg-Delay 10 W (332062) DHS: Send Time: 10350 Msg-Delay 9 W (332072) DHS: Send Time: 10359 Msg-Delay 9 W (332082) DHS: Send Time: 10369 Msg-Delay 10 W (332092) DHS: Send Time: 10379 Msg-Delay 9 W (332102) DHS: Send Time: 10388 Msg-Delay 9 W (332112) DHS: Send Time: 10398 Msg-Delay 9 W (332122) DHS: Send Time: 10408 Msg-Delay 9 W (332132) DHS: Send Time: 10417 Msg-Delay 9 W (332142) DHS: Send Time: 10426 Msg-Delay 9 W (332152) DHS: Send Time: 10436 Msg-Delay 12 W (332162) DHS: Send Time: 10449 Msg-Delay 9 W (332172) DHS: Send Time: 10458 Msg-Delay 9 W (332182) DHS: Send Time: 10467 Msg-Delay 9 W (332192) DHS: Send Time: 10477 Msg-Delay 9 W (332202) DHS: Send Time: 10486 Msg-Delay 9 W (334472) DHS: Send Time: 10486 Msg-Delay 2278 W (334472) DHS: Send Time: 10496 Msg-Delay 2275 W (334482) DHS: Send Time: 12765 Msg-Delay 13 W (334492) DHS: Send Time: 12771 Msg-Delay 13 W (334492) DHS: Send Time: 12778 Msg-Delay 13 W (334862) DHS: Send Time: 12785 Msg-Delay 371 W (335372) DHS: Send Time: 13156 Msg-Delay 513

HWuest avatar Jul 12 '23 14:07 HWuest

Hi, due to the use of UDP for internal communication, packet loss is possible, so we do not recommend using this kind of back-and-forth interactive message flow. If any packets are lost due to network fluctuations or other reasons, it can lead to communication interruptions as shown in the log above.

tswen avatar Jul 17 '23 07:07 tswen

Hi, yes, shure, the interactive way of communication was only a short test SW done to meassure and show you the happening delays in the message processing. To have the raw communication times I did the first messurements with max_retry 0. The last messurement with max_retry 1 shows that a mesage loss seems to come together with the long delay times. The retry leads to delays above 2 seconds.

The main question is still, why is the behaviour so different with or without rooter (constantly below 12ms without any message loss versus up to 400ms -without retry- and some message losses)? For me it looks as if the root node still regularily scans the WLAN-Network to find a rooter which blocks the message communication for while... If this is the case, is there a possibility to stop this scanning process?

Can you also give me a hint to my question "What is the easiest way to exchange the new libraries in my existing Espressif-IDE project?" Manually coppying the library from Git to my project folder seems not to be the optimal way...

HWuest avatar Jul 17 '23 08:07 HWuest

You just need to replace the original mesh_lite component of your project with the latest mesh_lite component.

tswen avatar Jul 19 '23 02:07 tswen

https://github.com/tswen/esp-mesh-lite/tree/examples/add_no_router_example/components/mesh_lite

The component has been updated. Additionally, for the "no_router" scenario, the following configuration must be performed.

    esp_mesh_lite_config_t mesh_lite_config = ESP_MESH_LITE_DEFAULT_INIT();
    mesh_lite_config.join_mesh_ignore_router_status = true;
#if CONFIG_MESH_ROOT
    mesh_lite_config.join_mesh_without_configured_wifi = false;
#else
    mesh_lite_config.join_mesh_without_configured_wifi = true;
#endif
    esp_mesh_lite_init(&mesh_lite_config);

#if CONFIG_MESH_ROOT
    ESP_LOGI(TAG, "Root node");
    esp_mesh_lite_set_allowed_level(1);
#else
    ESP_LOGI(TAG, "Child node");
#endif

tswen avatar Jul 19 '23 02:07 tswen

I exchanged the component with your last version.

I take over your configuration into my code and changed the test SW as according to your comment as follows:

  1. child node sends a message with current time-stamp to root node
  2. root node recives the message and send it back to the child node
  3. child node reports time delay between send and recieve time on the console
  4. after 200ms Child sends next test message
  5. this repeats endless

When I start the devices with a configured and accessible hot-spot (PC with WLAN hot-spot on) the delay is about 10ms, maximum is 14ms and seldom up to 28 (retry when message was lost due to UDP-protocol). --> expected behaviour.

When I switch of the WLAN-hotspot or start without activated WLAN-hotspot the situation changes. Most of the time the delay is also about 10ms but every 10 seconds the time increases to sometimes above 500ms for 5 seconds. For the next 5 seconds again the average goes down to 10ms.

Switching the hot-spot on again heals the process some seconds after the WLAN-connection is established again by the root node.

The behaviour is independent of the WiFi provisioning scan method (fast or all channel scan) where I thought this must have an influence if it comes from the scanning process. So the problem seems to be somwhere else. Are there any things in wifimesh_lite timed with 10 seconds intervall?

Have you been able to reproduce this behaviour on your side?

Here a measurement result of the delays when hot-spot is off (with some comments):

W (38182) DHS: Send Time: 38595 Msg-Delay 9 W (38392) DHS: Send Time: 38795 Msg-Delay 9 W (38582) DHS: Send Time: 38995 Msg-Delay 10 W (38782) DHS: Send Time: 39195 Msg-Delay 9 W (38982) DHS: Send Time: 39395 Msg-Delay 9 W (39192) DHS: Send Time: 39595 Msg-Delay 11 W (39382) DHS: Send Time: 39795 Msg-Delay 9 W (39582) DHS: Send Time: 39995 Msg-Delay 9 W (39792) DHS: Send Time: 40195 Msg-Delay 11 W (39992) DHS: Send Time: 40395 Msg-Delay 12 W (40182) DHS: Send Time: 40595 Msg-Delay 9 W (40392) DHS: Send Time: 40795 Msg-Delay 11 W (40592) DHS: Send Time: 40995 Msg-Delay 12 W (40782) DHS: Send Time: 41195 Msg-Delay 9 W (40982) DHS: Send Time: 41395 Msg-Delay 9 W (41182) DHS: Send Time: 41595 Msg-Delay 9 W (41382) DHS: Send Time: 41795 Msg-Delay 9 W (41592) DHS: Send Time: 41995 Msg-Delay 13 W (41782) DHS: Send Time: 42195 Msg-Delay 9 W (41982) DHS: Send Time: 42395 Msg-Delay 9 W (42182) DHS: Send Time: 42595 Msg-Delay 9 W (42382) DHS: Send Time: 42795 Msg-Delay 9 W (42582) DHS: Send Time: 42995 Msg-Delay 9 W (42792) DHS: Send Time: 43195 Msg-Delay 12 W (43092) DHS: Send Time: 43395 Msg-Delay 115 <-- start of disturbance W (43242) DHS: Send Time: 43595 Msg-Delay 64 W (43392) DHS: Send Time: 43795 Msg-Delay 16 W (43692) DHS: Send Time: 43995 Msg-Delay 119 W (43852) DHS: Send Time: 44195 Msg-Delay 71 W (44002) DHS: Send Time: 44395 Msg-Delay 21 W (44542) DHS: Send Time: 44595 Msg-Delay 366 W (44552) DHS: Send Time: 44795 Msg-Delay 171 W (44932) DHS: Send Time: 44995 Msg-Delay 357 W (44942) DHS: Send Time: 45195 Msg-Delay 162 W (45322) DHS: Send Time: 45395 Msg-Delay 345 W (45332) DHS: Send Time: 45595 Msg-Delay 150 W (45382) DHS: Send Time: 45795 Msg-Delay 9 W (45622) DHS: Send Time: 45995 Msg-Delay 47 W (45782) DHS: Send Time: 46195 Msg-Delay 9 W (46082) DHS: Send Time: 46395 Msg-Delay 102 W (46232) DHS: Send Time: 46595 Msg-Delay 55 W (46382) DHS: Send Time: 46795 Msg-Delay 9 W (46682) DHS: Send Time: 46995 Msg-Delay 107 W (46842) DHS: Send Time: 47195 Msg-Delay 64 W (46992) DHS: Send Time: 47395 Msg-Delay 11 W (47372) DHS: Send Time: 47595 Msg-Delay 199 W (47392) DHS: Send Time: 47797 Msg-Delay 8 W (47772) DHS: Send Time: 47995 Msg-Delay 193 W (47782) DHS: Send Time: 48195 Msg-Delay 9 W (48162) DHS: Send Time: 48395 Msg-Delay 180 <-- 5 seconds after disturbance start W (48192) DHS: Send Time: 48595 Msg-Delay 10 W (48382) DHS: Send Time: 48795 Msg-Delay 9 W (48582) DHS: Send Time: 48995 Msg-Delay 9 W (48782) DHS: Send Time: 49195 Msg-Delay 9 W (48992) DHS: Send Time: 49395 Msg-Delay 12 W (49182) DHS: Send Time: 49595 Msg-Delay 9 W (49382) DHS: Send Time: 49795 Msg-Delay 9 W (49582) DHS: Send Time: 49995 Msg-Delay 9 W (49782) DHS: Send Time: 50195 Msg-Delay 9 W (49982) DHS: Send Time: 50395 Msg-Delay 9 W (50182) DHS: Send Time: 50595 Msg-Delay 9 W (50382) DHS: Send Time: 50795 Msg-Delay 9 W (50582) DHS: Send Time: 50995 Msg-Delay 9 W (50782) DHS: Send Time: 51195 Msg-Delay 9 W (50982) DHS: Send Time: 51395 Msg-Delay 9 W (51182) DHS: Send Time: 51595 Msg-Delay 9 W (51392) DHS: Send Time: 51795 Msg-Delay 10 W (51582) DHS: Send Time: 51995 Msg-Delay 9 W (51782) DHS: Send Time: 52195 Msg-Delay 9 W (51992) DHS: Send Time: 52395 Msg-Delay 10 W (52182) DHS: Send Time: 52595 Msg-Delay 9 W (52382) DHS: Send Time: 52795 Msg-Delay 9 W (52582) DHS: Send Time: 52995 Msg-Delay 9 W (52792) DHS: Send Time: 53195 Msg-Delay 11 W (53092) DHS: Send Time: 53395 Msg-Delay 116 <-- next disturbance again 5 seconds later W (53242) DHS: Send Time: 53595 Msg-Delay 65 W (53402) DHS: Send Time: 108 Msg-Delay 26 W (53702) DHS: Send Time: 308 Msg-Delay 120 W (53852) DHS: Send Time: 508 Msg-Delay 71 W (54002) DHS: Send Time: 708 Msg-Delay 26 W (54542) DHS: Send Time: 908 Msg-Delay 364 W (54542) DHS: Send Time: 1108 Msg-Delay 169 W (54932) DHS: Send Time: 1308 Msg-Delay 355 W (54942) DHS: Send Time: 1508 Msg-Delay 160 W (55322) DHS: Send Time: 1708 Msg-Delay 346 W (55332) DHS: Send Time: 1908 Msg-Delay 151 W (55382) DHS: Send Time: 2108 Msg-Delay 9 W (55622) DHS: Send Time: 2308 Msg-Delay 48 W (55782) DHS: Send Time: 2508 Msg-Delay 9 W (56082) DHS: Send Time: 2708 Msg-Delay 102 W (56232) DHS: Send Time: 2908 Msg-Delay 54 W (56382) DHS: Send Time: 3108 Msg-Delay 9 W (56682) DHS: Send Time: 3308 Msg-Delay 107 W (56832) DHS: Send Time: 3508 Msg-Delay 58 W (56992) DHS: Send Time: 3708 Msg-Delay 11 W (57382) DHS: Send Time: 3908 Msg-Delay 200 W (57392) DHS: Send Time: 4112 Msg-Delay 8 W (57772) DHS: Send Time: 4308 Msg-Delay 192 W (57782) DHS: Send Time: 4508 Msg-Delay 8 W (58162) DHS: Send Time: 4708 Msg-Delay 182 W (58192) DHS: Send Time: 4908 Msg-Delay 10 <-- 5 seconds later disturbance ends W (58382) DHS: Send Time: 5108 Msg-Delay 9 W (58582) DHS: Send Time: 5308 Msg-Delay 9 W (58792) DHS: Send Time: 5508 Msg-Delay 10 W (58992) DHS: Send Time: 5708 Msg-Delay 12 W (59182) DHS: Send Time: 5908 Msg-Delay 9 W (59382) DHS: Send Time: 6108 Msg-Delay 9 W (59582) DHS: Send Time: 6308 Msg-Delay 9 W (59792) DHS: Send Time: 6508 Msg-Delay 10 W (59992) DHS: Send Time: 6708 Msg-Delay 10 W (60192) DHS: Send Time: 6908 Msg-Delay 14 W (60382) DHS: Send Time: 7108 Msg-Delay 9 W (60582) DHS: Send Time: 7308 Msg-Delay 9 W (60782) DHS: Send Time: 7508 Msg-Delay 9 W (60982) DHS: Send Time: 7708 Msg-Delay 9 W (61182) DHS: Send Time: 7908 Msg-Delay 9 W (61382) DHS: Send Time: 8108 Msg-Delay 9 W (61582) DHS: Send Time: 8308 Msg-Delay 9 W (61792) DHS: Send Time: 8508 Msg-Delay 12 W (61982) DHS: Send Time: 8708 Msg-Delay 9 W (62182) DHS: Send Time: 8908 Msg-Delay 9 W (62382) DHS: Send Time: 9108 Msg-Delay 9 W (62582) DHS: Send Time: 9308 Msg-Delay 9

HWuest avatar Jul 19 '23 08:07 HWuest

Hi @HWuest, If the root node was previously connected to a hotspot, and then the hotspot is turned off, the root node will try to maintain the mesh network by attempting to reconnect and perform periodic scanning. The purpose of this is to quickly find available nodes or routers. However, these scanning actions can cause data delays.

tswen avatar Jul 20 '23 06:07 tswen

It also happens before a connection was made to a hotspot! This is my problem...

The connection test was only to prove that the problem rearly depends on the connection status (because I still have no information if you could reproduce the problem on your side and what the root couse is).

So as stated above, my question is: How can I stop this disturbing scanning process?

Can you give me an API-call or parameter to switch it off when not needed?

In a rooterless network it's the normal situation that there is no rooter (is'nt it), so this seems to be an important function of mesh-lite-network I think...

(In our mobile application there will be no rooter available and the scanning is therefore useless and varying delays of more then about 200ms will make the mesh-lite network useless for our aplication)

HWuest avatar Jul 20 '23 07:07 HWuest

Because there was no further answer from your side I continued to tried some wokrarounds (within the standard WIFI-function set).

My working solution now is to disable the STA mode of the WIFI when I detect that no rooter could be found in a reasonable time by calling

esp_wifi_set_mode(WIFI_MODE_AP);

I still get every 10 seconds the warning "W (xxxxx) vendor_ie: Scanning in progress, please try again later" from mesh-lite scann process but becouse of dsabled STA-mode no full scan takes place. So only a slight disturbance in the mesh-communication is left which seems to be acceptable for now.

If there is am official solution in future please inform me....

HWuest avatar Aug 11 '23 09:08 HWuest

Hello, regarding the optimization for the rooterless scenario, the periodic scanning was removed. However, in scenarios where the devices were initially connected to a router and later the router was powered off, this periodic scanning is necessary. This is because the root node cannot predict when it will rediscover the router. Nevertheless, we can consider providing an interface in the future to stop or restart this periodic scanning.

tswen avatar Aug 11 '23 11:08 tswen

Thank you for your support, such an interface would be great.

Especially for the rooterless scenario it is needed to call esp_mesh_lite_set_allowed_level(1); on the root node which reduces the self-configuring functionality of the mesh-lite network already (root node is now fixed).

When it is expected, that a rooter will come up later the further scanning makes sense.

In the full rooterless scenario the possibility to switch the scanning off by a function-call would be the right way.

Currently with esp_wifi_set_mode(WIFI_MODE_AP); it works now for me but I figured out that after a reset it happens that the mesh-lite initialisation immediatly finds it's own accespoint and connects to it as a child node of level 3 (I think becouse STA-mode is off, so scanning retrieves the AP information). So before the initialisation of the mesh network I had explicitly to set WIFI mode back to APSTA (last status seems to be held in NVS).

Trying to switch WIFI NVS storage off to overcome this leads to the problem that (when a rooter is available) the rooter is not found any more after a reset ?!? The NVS WIFI storage also leads to some strange behaviour when the rooter SSID or Password is changed. The system than falls often back to the stored values which could lead to connections with the wrong (other than configured) router.

Maybe this information helps improving the very good mesh-lite network further. If I should open up new issues for the WIF NVS findings I have to investigate the behaviour a bit more precisely...

HWuest avatar Aug 11 '23 12:08 HWuest

You can call esp_wifi_set_storage(WIFI_STORAGE_FLASH) before storing Wi-Fi information in NVS, and call esp_wifi_set_storage(WIFI_STORAGE_RAM) before not needing to store Wi-Fi information in NVS.

tswen avatar Aug 14 '23 02:08 tswen

Yes, shure, but the problem is in the restart behaviour using some stored information independent of esp_wifi_set_storage(WIFI_STORAGE_FLASH) or esp_wifi_set_storage(WIFI_STORAGE_RAM) are WiFi NVS storage configuration.

Sometimes the device starts up as client level 3 (when it was runing in the rooterless AP only mode before). After this a soft-reset results in the message: vendor_ie: [Unexpected restart]: the rtc data: level:3, ssid:xx, bssid yy:yy:yy:yy:yy:yy even when WIFI_STORAGE_RAM is set, WiFi NVS usage is off and mode is configured as APSTA again early in the SW.

From than on no Rooter could be found after a SW-Reset is done (which again leads to the state that AP-only mode is reactivated). With a power off/on cycle the device directly goes back to connecting as level 3 child (there are no other wifi_lite devices on air, so level must be 1 with trying to connect to the rooter).

If another mesh-lite device is available on level 1 (with rooter connection or not) the device starts up as expected (with child level 2).

To get out of this connection behaviour I had to take out the two commands I introduced for the rooterless mode: // esp_mesh_lite_set_allowed_level(1); // enable rooterless network connection // esp_wifi_set_mode(WIFI_MODE_AP); and start the device (power cycling) so that it comes up one time without finding the rooter but not setting the AP only mode. From the next startup on the device recognizes the rooter again and works normal.

Is there information from the last connection ap_mode status, level, (ssid and pw) of mesh_lite stored in flash or somewhere else and how can I omit this or even delete the information during startup so that no wrong values are used which lead to the strange behaviour? What side effect is there by switching to AP only mode outside of the mesh_lite functions and how can I avoid that?

HWuest avatar Aug 15 '23 12:08 HWuest