ESP32 Error rate and unidirectionnal throughput

Thanks for great project. We tested it with a modified version of bw-test.py from mavlink's project. I did not created the MR yet so here's the gist: https://gist.github.com/MarcFinetRtone/0d2732034f2a6a863614d37d7c6ed782

I added support for flow-control (I will open a MR here for this to fix #57), branch: https://github.com/MarcFinetRtone/dronebridge-esp32/tree/feat/add-flow-control).

The setup:

two esp32
one as AP, the other as STA
using ./bw-test.py --baud 460_800 --lat --device /dev/serial/by-id/… > >(ts %H:%M:%S | tee -a $(date +%Y-%m-%d-%H%M%S.log))& pid=$!; doit(){ trap "kill $pid; return 0;" INT; while true; do read -t 10; kill -USR1 "$pid"; done;}; doit

configuration (AP here):

{
        "esp32_mode":   1,
        "wifi_ssid":    "WF_AP",
        "wifi_pass":    "<REDACTED>",
        "ap_channel":   6,
        "trans_pack_size":      64,
        "tx_pin":       16,
        "rx_pin":       17,
        "cts_pin":      25,
        "rts_pin":      23,
        "rts_thresh":   30,
        "baud": 460800,
        "telem_proto":  4,
        "ltm_pp":       1,
        "msp_ltm_port": 0,
        "ap_ip":        "192.168.2.1"
}

The results:

14:41:06 Latency: 10-last: mean: 48.479ms, median: 49.968
14:41:06 Error:  10-last: errors=1028/9853pkt 10.43% 0.11/s
14:41:07 δ 61.81ms
14:41:07  371 sent ( 367.5/s), 1179 received (1167.8/s),  217 errors bwin= 31.56 kB/s bwout= 11.97 kB/s
14:41:08 δ 49.69ms
14:41:08  376 sent ( 373.4/s), 1088 received (1080.5/s),  117 errors bwin= 31.75 kB/s bwout= 12.16 kB/s
14:41:09 δ 45.77ms
14:41:09  376 sent ( 371.4/s), 1013 received (1000.7/s),   66 errors bwin= 30.69 kB/s bwout= 12.10 kB/s
14:41:10  391 sent ( 388.8/s), 1016 received (1010.4/s),   38 errors bwin= 31.85 kB/s bwout= 12.66 kB/s
14:41:11 δ 50.14ms
14:41:11 δ 60.73ms
14:41:11  391 sent ( 389.5/s),  940 received ( 936.3/s),   38 errors bwin= 29.39 kB/s bwout= 12.68 kB/s
[…]
14:41:24  436 sent ( 435.7/s),    0 received (   0.0/s),    0 errors bwin=  0.00 kB/s bwout= 14.19 kB/s
14:41:25  436 sent ( 433.9/s),  219 received ( 218.0/s),    3 errors bwin=  7.04 kB/s bwout= 14.13 kB/s
14:41:26  421 sent ( 419.4/s),  421 received ( 419.4/s),    0 errors bwin= 13.68 kB/s bwout= 13.66 kB/s
14:41:26 δ 27.10ms
14:41:26 δ 19.75ms
14:41:27  416 sent ( 415.8/s),  427 received ( 426.8/s),    0 errors bwin= 13.91 kB/s bwout= 13.54 kB/s
14:41:27 Latency: 10-last: mean: 43.993ms, median: 47.422
14:41:27 Error:  10-last: errors=3/1067pkt 0.28% 0.00/s
14:41:28  416 sent ( 415.8/s),  435 received ( 434.8/s),    0 errors bwin= 14.17 kB/s bwout= 13.54 kB/s
14:41:28 δ 27.72ms
14:41:29  421 sent ( 417.1/s),  432 received ( 428.0/s),    0 errors bwin= 13.95 kB/s bwout= 13.59 kB/s
14:41:29 δ 11.82ms
14:41:30  421 sent ( 415.7/s),  439 received ( 433.4/s),    0 errors bwin= 14.09 kB/s bwout= 13.54 kB/s

The issues:

we noticed some errors from even without long distance (i.e. a few meters, < 10m) when the link is saturated. Above, we added the --delay=10 on the peer (that lowered the throughput, but also the error down to 0). Even the flow control does not help here.
we cannot have a unidirectional stream (i.e. with --no-tx on one device). Since it's what will be live, we wanted to test this situation (in order to see if throughput (rover→ground-station) increases)

Any idea ?

Feb 14 '24 21:02 MarcFinetRtone

Thank you for your detailed investigation. I also did come across this issue when testing with two ESP32s. There should be no issue when you only use one ESP32 (AP or client) in combination with a regular PC connecting via Wifi to the ESP.

I have a few best guesses that I could not test so far:

Increase TX & RX Buffers via menuconfig
Decrease max wait time when reading UART, currently ~200ms or tinker with TRANS_RD_BYTES_NUM (that is how many bytes we try to read with every call)

My tests without flow control showed even worse results with a packet loss of up to 90%. I think the ESP32 HW can do much more so there must be an issue with the implementation here.

Feb 15 '24 12:02 seeul8er

I just set the max wait time of uart_read_bytes to zero and it had quite some impact. My loss rate is down to <1% now! 200ms was very unreasonable. I think the issue did not pop up till now since the upstream from the GCS is rather low and thus waiting that long for data did not happen too often. Now with the ticks to wait set to zero, I think it would also make sense to replace TRANS_RD_BYTES_NUM with the transparent packet size TRANSPARENT_BUF_SIZE. I haven't tested that but this should improve throughput even further.

Waiting for your results

Feb 15 '24 21:02 seeul8er

The issue should be gone/improved with the new v1.5 release

Mar 08 '24 17:03 seeul8er

ESP32 ESP32 copied to clipboard

Error rate and unidirectionnal throughput

ESP32
ESP32 copied to clipboard