esp-mesh-lite icon indicating copy to clipboard operation
esp-mesh-lite copied to clipboard

mesh lite node api request time out & long time (AEGHB-791)

Open yel-best opened this issue 1 year ago • 28 comments

use Mesh-Lite v0.10.3 use ESP-IDF v5.2.2

I'm currently using a total of 5 nodes, all with default configurations and no modifications. Here's the logic of the test program I conducted:

Using the 5 nodes, each sends an HTTP request every second to an internal IP address. The API logic is straightforward, consisting of a GET request that consistently returns the same data structure, and the response time is very short.

However, the ESP32S3's performance, as shown in the diagram below, indicates the total time taken for each HTTP request in milliseconds.

You'll notice that the requests are relatively slow. When I access the same API with my computer, the request time is usually under 100 milliseconds. However, the ESP32S3 doesn't meet this requirement. There were no network topology changes among the nodes—there are only 5 nodes in total, all online, with one acting as the root node.

image

yel-best avatar Aug 26 '24 07:08 yel-best

Update test results

If the http request is made by the ROOT node, it will be relatively fast, but if the level is lower, such as layer 4 and layer 5, the following results will appear, the delay is about 6000+(MS) or more, is it because of the problem of the forwarding test node jumping? Is there a solution?

I made a script in the application that requests an api every 2s and waits for the api request to succeed and then waits for 2s to make the next request

image

yel-best avatar Aug 28 '24 05:08 yel-best

How many devices have you deployed in total? Is the surrounding wireless environment subject to significant interference?

tswen avatar Sep 05 '24 02:09 tswen

How many devices have you deployed in total? Is the surrounding wireless environment subject to significant interference?

Close to 90 devices, using the WIFI scanning tool to detect WIFI quality is better

yel-best avatar Sep 06 '24 02:09 yel-best

@tswen There are some questions about the configuration,hope you can answer, thank you

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER Does it represent the number of children allowed? CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWEDDoes it represent the number of levels allowed?

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER=5
CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED=10

yel-best avatar Sep 11 '24 06:09 yel-best

How many devices have you deployed in total? Is the surrounding wireless environment subject to significant interference?

Close to 90 devices, using the WIFI scanning tool to detect WIFI quality is better

If there are 90 devices, since all the devices are in the same channel, the interference between them is relatively large. It is recommended to increase the beacon interval appropriately.

wifi_config_t wifi_cfg;
esp_wifi_get_config(WIFI_IF_AP, &wifi_cfg);
wifi_cfg.ap.beacon_interval = 400;
esp_wifi_set_config(WIFI_IF_AP, &wifi_cfg);

tswen avatar Sep 11 '24 07:09 tswen

@tswen There are some questions about the configuration,hope you can answer, thank you

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER Does it represent the number of children allowed? CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWEDDoes it represent the number of levels allowed?

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER=5
CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED=10

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER: is not used yet, so you don't need it. CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED: Yes, if set to 10, the network will reach a maximum of ten levels.

tswen avatar Sep 11 '24 07:09 tswen

@tswen There are some questions about the configuration,hope you can answer, thank you CONFIG_MESH_LITE_MAX_ROUTER_NUMBER Does it represent the number of children allowed? CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWEDDoes it represent the number of levels allowed?

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER=5
CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED=10

CONFIG_MESH_LITE_MAX_ROUTER_NUMBER: is not used yet, so you don't need it. CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED: Yes, if set to 10, the network will reach a maximum of ten levels.

The maximum number of layers allowed in the network: 1-15

The maximum number of downstream connections that each node can have: 1-10

Maximum number of downstream connections =3 , configuration parameter should I modify?

Thank you for your help

yel-best avatar Mar 10 '25 05:03 yel-best

The configuration CONFIG_BRIDGE_SOFTAP_MAX_CONNECT_NUMBER specifies the maximum number of stations that can directly connect to this SoftAP.
The configuration CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED can generally remain at its default value. If CONFIG_BRIDGE_SOFTAP_MAX_CONNECT_NUMBER is set to 10 and CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED is set to 3, the entire mesh can support up to 111 devices in the most ideal scenario: 1 device at level 1, 10 devices at level 2, and 100 devices at level 3.

tswen avatar Mar 18 '25 08:03 tswen

The configuration CONFIG_BRIDGE_SOFTAP_MAX_CONNECT_NUMBER specifies the maximum number of stations that can directly connect to this SoftAP. The configuration CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED can generally remain at its default value. If CONFIG_BRIDGE_SOFTAP_MAX_CONNECT_NUMBER is set to 10 and CONFIG_MESH_LITE_MAXIMUM_LEVEL_ALLOWED is set to 3, the entire mesh can support up to 111 devices in the most ideal scenario: 1 device at level 1, 10 devices at level 2, and 100 devices at level 3.

At present, I have 3 nodes in each layer, there are 3 layers, all of which can connect to mqtt, and then the msg message sent is about 500b, and it is sent every 30s, about such a quantity, and then I have 100 ESP32s here. The trial mesh_id = 71, mesh wifi & route wifi are all configured the same, and it is found in the test that when the msg is transmitted, the end layer 3 node will appear slow when sending and receiving msg, and the delay is greater than 10s before sending and receiving msg to mqtt. What is the problem? I have not adjusted other default parameters, do I need to increase the mesh network traffic or other operations to optimize the speed?

thanks 🥰

yel-best avatar Mar 18 '25 08:03 yel-best

Could you clarify whether the ​significant msg delay you mentioned occurs in:

​1. Node-to-node communication via Mesh-Lite APIs, ​or 2. ​MQTT-based communication?

Additionally, would you try adjusting the ​beacon interval to 1000? This modification should notably improve performance in interference-prone environments.

tswen avatar Apr 11 '25 08:04 tswen

Could you clarify whether the ​significant msg delay you mentioned occurs in:

​1. Node-to-node communication via Mesh-Lite APIs, ​or 2. ​MQTT-based communication?

Additionally, would you try adjusting the ​beacon interval to 1000? This modification should notably improve performance in interference-prone environments.

Hi, Thank you for your answer.

node to node communication is not used,

All use mqtt to connect and send information, the traffic is about 1 piece of data per second, the data size is about 500b, I try to give wifi_cfg.ap.beacon_interval = 400; I will use your suggestions to adjust wifi_cfg.ap.beacon_interval = 1000;

However, the environment we use is relatively dense and there are not many routers, so the pressure is concentrated on mesh lite. I want to make the mesh lite very balanced except for 1~2~3 layer nodes. At present, we use 3 layers with 3 nodes per layer, and 300 ESP32s are used onsite

I will find that sometimes the device keeps remembering an upstream AP. I want to make the device forget the upstream AP every time it restarts and re-scan the connection. How should I do?

yel-best avatar Apr 14 '25 07:04 yel-best

Hi @yel-best ,

We're trying to create something similar to your system configuration here, but are having an issue with the mesh stability as a whole even with <20 nodes in the mesh. Our expectation was to have a similar mesh size to that of yours or maybe cutting the 90 node mesh into smaller ones to reduce stress on the network.

Our configuration is as follows and we're using PCB antennae also; .vendor_id = {76, 77}, \ .mesh_id = 10, \ .max_connect_number = 10, \ .max_router_number = 1, \ .max_level = 3, \ .max_node_number = 255, \ .join_mesh_ignore_router_status = 1, \ .join_mesh_without_configured_wifi = 1, \ .leaf_node = 0, \ .ota_data_len = 0, \ .ota_wnd = 0, \ .softap_ssid = Mesh_Bridge, \ .softap_password = adminPassword, \ .device_category = ESP32S3\

Would it be possible to share your mesh_lite_config_t in regards to how you've initialized the mesh_lite and if you're using external antennae via the U.FL (IPEX) connector?

many thanks and I look forward to seeing the resolution of this issue/post Best, BR

BR-Coding-cmd avatar Apr 14 '25 10:04 BR-Coding-cmd

Could you clarify whether the ​significant msg delay you mentioned occurs in: ​1. Node-to-node communication via Mesh-Lite APIs, ​or 2. ​MQTT-based communication? Additionally, would you try adjusting the ​beacon interval to 1000? This modification should notably improve performance in interference-prone environments.

Hi, Thank you for your answer.

node to node communication is not used,

All use mqtt to connect and send information, the traffic is about 1 piece of data per second, the data size is about 500b, I try to give wifi_cfg.ap.beacon_interval = 400; I will use your suggestions to adjust wifi_cfg.ap.beacon_interval = 1000;

However, the environment we use is relatively dense and there are not many routers, so the pressure is concentrated on mesh lite. I want to make the mesh lite very balanced except for 1~2~3 layer nodes. At present, we use 3 layers with 3 nodes per layer, and 300 ESP32s are used onsite

I will find that sometimes the device keeps remembering an upstream AP. I want to make the device forget the upstream AP every time it restarts and re-scan the connection. How should I do?

If 300 devices are deployed in a highly dense configuration, channel interference between devices will be significant—not just the root node, but all devices will be affected. We strongly advise against overly dense device placement.

If you want to prevent reconnection to a previously linked parent node after esp_restart, call esp_mesh_lite_erase_rtc_store before rebooting.

tswen avatar Apr 15 '25 02:04 tswen

Hi @yel-best ,

We're trying to create something similar to your system configuration here, but are having an issue with the mesh stability as a whole even with <20 nodes in the mesh. Our expectation was to have a similar mesh size to that of yours or maybe cutting the 90 node mesh into smaller ones to reduce stress on the network.

Our configuration is as follows and we're using PCB antennae also; .vendor_id = {76, 77}, \ .mesh_id = 10, \ .max_connect_number = 10, \ .max_router_number = 1, \ .max_level = 3, \ .max_node_number = 255, \ .join_mesh_ignore_router_status = 1, \ .join_mesh_without_configured_wifi = 1, \ .leaf_node = 0, \ .ota_data_len = 0, \ .ota_wnd = 0, \ .softap_ssid = Mesh_Bridge, \ .softap_password = adminPassword, \ .device_category = ESP32S3\

Would it be possible to share your mesh_lite_config_t in regards to how you've initialized the mesh_lite and if you're using external antennae via the U.FL (IPEX) connector?

many thanks and I look forward to seeing the resolution of this issue/post Best, BR

If you are not using the no_router scenario, we recommend setting join_mesh_ignore_router_status to 0. In root-node-connected-to-router cases, setting this parameter to 1 may negatively impact mesh formation.

tswen avatar Apr 15 '25 02:04 tswen

Hi,I found that when I was using "I have 3 nodes in each layer, there are 3 layers", the third-layer nodes still had downstream, and there were even as many as 8 to 9 downstream nodes, which exceeded my original display. I checked this because of the value obtained by using the esp_mesh_lite_get_mesh_node_number method when at the 3rd layer node

Is there a problem with my level? There are 100 ESP32 units on site in a not very dense area, with each unit spaced approximately 1 to 2 meters apart. They are used for fixed-point collection. Under the condition that all mesh configurations are the same, including the MeshID, Mesh SSID and PWD of all devices, will adjusting the rack size improve the situation? For example, open up to 3 layers, with 5 nodes on each layer. Calculated this way, one mesh root node can support about 30 nodes. Is my calculation correct?

thanks 🥰

yel-best avatar May 15 '25 04:05 yel-best

open up to 3 layers, with 5 nodes on each layer. Calculated this way, one mesh root node can support about 30 nodes.

This calculation is correct.

If the issue is channel congestion, adjusting the maximum number of devices in the mesh topology will not improve performance. Only increasing the beacon interval and spacing between devices will help. However, based on your description, the device density seems acceptable.

Reducing the maximum number of devices in the mesh topology (from 100 to 30) may lower communication latency because the root node will have less data forwarding pressure. Additionally, the mesh topology should then have fewer devices that are physically far apart or at high hierarchical levels.

tswen avatar May 15 '25 07:05 tswen

open up to 3 layers, with 5 nodes on each layer. Calculated this way, one mesh root node can support about 30 nodes.

This calculation is correct.

If the issue is channel congestion, adjusting the maximum number of devices in the mesh topology will not improve performance. Only increasing the beacon interval and spacing between devices will help. However, based on your description, the device density seems acceptable.

Reducing the maximum number of devices in the mesh topology (from 100 to 30) may lower communication latency because the root node will have less data forwarding pressure. Additionally, the mesh topology should then have fewer devices that are physically far apart or at high hierarchical levels.

Ok,

However, I have found that when we deploy such a large number of devices, after the ESP32 restarts, many such Task logs will be triggered, constantly scanning the WIFI and not connecting to the AP or STA, causing the devices to completely go offline and be unable to go online. What is the reason for this? Is it because there are too many? Or is it a problem with some Settings?

I (2528182) [ESP_Mesh_Lite_Comm]: approved: 1
I (2528182) app_bee: WI-FI Station Finished scanning AP
I (2528190) [vendor_ie]: esp_mesh_lite_wifi_scan_start return ESP_OK
I (2528690) [ESP_Mesh_Lite_Comm]: Mesh-Lite Comm Scan done
I (2528691) [ESP_Mesh_Lite_Comm]: approved: 1
I (2528691) app_bee: WI-FI Station Finished scanning AP
I (2529033) [vendor_ie]: esp_mesh_lite_wifi_scan_start return ESP_OK
I (2529533) [ESP_Mesh_Lite_Comm]: Mesh-Lite Comm Scan done
I (2529534) [ESP_Mesh_Lite_Comm]: approved: 1
I (2529534) app_bee: WI-FI Station Finished scanning AP
I (2529559) [vendor_ie]: esp_mesh_lite_wifi_scan_start return ESP_OK
I (2530059) [ESP_Mesh_Lite_Comm]: Mesh-Lite Comm Scan done
I (2530060) [ESP_Mesh_Lite_Comm]: approved: 1
I (2530060) app_bee: WI-FI Station Finished scanning AP
I (2530249) [vendor_ie]: esp_mesh_lite_wifi_scan_start return ESP_OK
I (2530750) [ESP_Mesh_Lite_Comm]: Mesh-Lite Comm Scan done
I (2530750) [ESP_Mesh_Lite_Comm]: approved: 1
I (2530751) app_bee: WI-FI Station Finished scanning AP
I (2530858) [vendor_ie]: esp_mesh_lite_wifi_scan_start return ESP_OK
I (2531358) [ESP_Mesh_Lite_Comm]: Mesh-Lite Comm Scan done
I (2531359) [ESP_Mesh_Lite_Comm]: approved: 1
I (2531360) app_bee: WI-FI Station Finished scanning AP
I (2531612) [vendor_ie]: esp_mesh_lite_wifi_scan_start return ESP_OK

yel-best avatar May 16 '25 05:05 yel-best

Where is the log message app_bee: WI-FI Station Finished scanning AP printed from? Does your application layer code call functions like esp_wifi_scan_start?

tswen avatar May 19 '25 02:05 tswen

Where is the log message app_bee: WI-FI Station Finished scanning AP printed from? Does your application layer code call functions like esp_wifi_scan_start?

static void app_scan_start(void) { // Implementation of scan start callback ESP_LOGD(TAG, "mesh lite app_scan_start"); return; }

static void app_scan_end(void) { // Implementation of scan end callback ESP_LOGD(TAG, "mesh lite app_scan_end"); return; }

static esp_mesh_lite_scan_cb_t mesh_lite_scan_cb = { .scan_start_cb = app_scan_start, .scan_end_cb = app_scan_end, };

esp_mesh_lite_scan_cb_register(&mesh_lite_scan_cb);

It was not called directly,esp_wifi_scan_start

I just registered such an event in the main function to facilitate my own viewing of some scanning action logs

will this be affected?

yel-best avatar May 19 '25 06:05 yel-best

Could you please call ​esp_mesh_lite_core_log_enable(true)​ after ​esp_mesh_lite_init, then provide the ​complete log files​ from power-on until the issue occurs for debugging?

tswen avatar May 20 '25 03:05 tswen

Could you please call ​esp_mesh_lite_core_log_enable(true)​ after ​esp_mesh_lite_init, then provide the ​complete log files​ from power-on until the issue occurs for debugging?

Ok, i will provide it later

In addition, I want to create two completely independent Meshes in the same area. For example, mesh_wifi_1 is for connecting the temperature sensor, and mesh_wifi_2 is for connecting the humidity sensor. I want to make a distinction. I don't know if it will improve the stability of the mesh network. How should I achieve it?

Image

yel-best avatar May 21 '25 03:05 yel-best

log.txt

yel-best avatar May 21 '25 08:05 yel-best

Could you please call ​esp_mesh_lite_core_log_enable(true)​ after ​esp_mesh_lite_init, then provide the ​complete log files​ from power-on until the issue occurs for debugging?

Ok, i will provide it later

In addition, I want to create two completely independent Meshes in the same area. For example, mesh_wifi_1 is for connecting the temperature sensor, and mesh_wifi_2 is for connecting the humidity sensor. I want to make a distinction. I don't know if it will improve the stability of the mesh network. How should I achieve it?

Image

Different mesh networks can be distinguished by setting different mesh IDs.

tswen avatar May 23 '25 08:05 tswen

log.txt

Could you confirm which version you are using? You can check this by looking for the "[vendor_ie]: Mesh-Lite commit id: xxxxxx" message during the mesh-lite initialization at boot. Alternatively, you may update mesh-lite to the latest master branch version and verify if the issue persists.

tswen avatar May 23 '25 08:05 tswen

log.txt

Could you confirm which version you are using? You can check this by looking for the "[vendor_ie]: Mesh-Lite commit id: xxxxxx" message during the mesh-lite initialization at boot. Alternatively, you may update mesh-lite to the latest master branch version and verify if the issue persists.

use mesh_lite v1.0.1

yel-best avatar May 23 '25 09:05 yel-best

log.txt

Could you confirm which version you are using? You can check this by looking for the "[vendor_ie]: Mesh-Lite commit id: xxxxxx" message during the mesh-lite initialization at boot. Alternatively, you may update mesh-lite to the latest master branch version and verify if the issue persists.

use mesh_lite v1.0.1

You can check this by looking for the "[vendor_ie]: Mesh-Lite commit id: xxxxxx".

I neet the commit id.

tswen avatar May 23 '25 09:05 tswen

log.txt

Could you confirm which version you are using? You can check this by looking for the "[vendor_ie]: Mesh-Lite commit id: xxxxxx" message during the mesh-lite initialization at boot. Alternatively, you may update mesh-lite to the latest master branch version and verify if the issue persists.

use mesh_lite v1.0.1

You can check this by looking for the "[vendor_ie]: Mesh-Lite commit id: xxxxxx".

I neet the commit id.

I (1790) [vendor_ie]: Mesh-Lite commit id: 454ded1

I (1796) [vendor_ie]: Mesh ID: 77

yel-best avatar May 26 '25 06:05 yel-best

Could you please provide the complete system logs from power-on until the moment the issue occurs? The logs shared previously don't clearly show the point where the problem emerges.

tswen avatar May 28 '25 11:05 tswen