Respectful Discussion and Suggestions Regarding the esp_eth and the emac driver (IDFGH-16281)
Checklist
- [x] Checked the issue tracker for similar issues to ensure this is not a duplicate.
- [x] Provided a clear description of your suggestion.
- [x] Included any relevant context or examples.
Issue or Suggestion Description
Over the past period, I have conducted a more in-depth review of the esp_eth component and the related network protocol stack source code. Combined with my testing, I have identified the following issues:
Poor small-packet performance in iperf testing.
When I forcibly set the MSS to 90, there were instances where no packets were received for an extended period! Besides, I also used hping for ICMP flood testing and the system was crashed when the packet interval is less than 500 µs.
I believe this issue is primarily caused by overly frequent interrupts. Whether during small-packet reception or under an ICMP flood, interrupts are triggered at a high rate, and each interrupt results in a context switch—an operation with considerable overhead.
On the other hand, our driver does not currently account for scenarios where multiple packets coexist in the EMAC. Regardless of whether it is the DM9051 or another EMAC, edge-triggered mode is invariably used for interrupt configuration. However, when multiple packets coexist in the EMAC, the interrupt pin level will not be pulled low after only one packet has been read. Unfortunately, our driver is currently unable to perform another read in such cases.
if (emac->int_gpio_num >= 0) {
gpio_func_sel(emac->int_gpio_num, PIN_FUNC_GPIO);
gpio_input_enable(emac->int_gpio_num);
gpio_pulldown_en(emac->int_gpio_num);
gpio_set_intr_type(emac->int_gpio_num, GPIO_INTR_POSEDGE);
gpio_intr_enable(emac->int_gpio_num);
gpio_isr_handler_add(emac->int_gpio_num, dm9051_isr_handler, emac);
}
To address this, I converted the interrupt-driven approach to polling, which alleviated the issue . However, the existing polling implementation still has substantial room for optimization.
With regard to network-performance optimization, Intel’s DPDK offers valuable inspiration. Taking into account the characteristics of the ESP32 platform, I believe the following improvements are feasible:
-
Replace hard interrupts with polling. We can poll the interrupt pin level. Notably, all Espressif products released in recent years support
Dedicated GPIO, which allows specialized instructions to read the interrupt pin levels of multiple network interfaces rapidly, thereby significantly reducing the time cost of polling. -
Process multiple packets per wake-up. Register the receive handlers of all network interfaces in a single queue and assign one task to service them. Each time any interface receives data, it wakes this task, which then iterates over each interface, reading packets until the corresponding interrupt pin level is deasserted. This approach can markedly reduce scheduling overhead and fully utilize the EMAC’s internal SRAM as buffering, mitigating packet loss.
-
Leverage hardware offloading. Tasks such as CRC and checksum calculation can be delegated more extensively to the EMAC. Erroneous packets can be dropped before reaching lwIP, preventing unnecessary load. Unfortunately, checksum offload is currently disabled by default. Meanwhile, in IDF’s lwIP configuration, TCP checksums are enabled by default and cannot be disabled via menuconfig; I believe this work can and should be handled by the EMAC.
/* do not generate checksum for UDP, TCP and IPv4 packets */
ESP_GOTO_ON_ERROR(dm9051_register_write(emac, DM9051_TCSCR, 0x00), err, TAG, "write TCSCR failed");
/* disable check sum for receive packets */
ESP_GOTO_ON_ERROR(dm9051_register_write(emac, DM9051_RCSCSR, 0x00), err, TAG, "write RCSCSR failed");
Additionally, for lwIP implementation, platforms like the ESP32-S3 that include SIMD instruction extensions can achieve better performance. For example, using the VLD instruction to load 128-bit wide data in a single operation, together with VCMP and other extended instructions for rapid wide comparisons and logic operations, can significantly improve system throughput.
Use of memcpy from buffer to stack_input in the driver without implementation of zero-copy.
if (status & ISR_PR) {
do {
uint32_t buf_len;
if (emac->parent.receive(&emac->parent, emac->rx_buffer, &buf_len) == ESP_OK) {
/* if there is waiting frame */
if (buf_len > 0) {
uint8_t *buffer = malloc(buf_len);
if (buffer == NULL) {
ESP_LOGE(TAG, "no mem for receive buffer");
} else {
memcpy(buffer, emac->rx_buffer, buf_len);
ESP_LOGD(TAG, "receive len=%" PRIu32, buf_len);
/* pass the buffer to stack (e.g. TCP/IP layer) */
emac->eth->stack_input(emac->eth, buffer, buf_len);
}
}
} else {
ESP_LOGE(TAG, "frame read from module failed");
}
} while (emac->packets_remain);
}
Firstly, through tracing, I found that the emac->parent.receive method is not used anywhere except within the emac_xxx_task function. I honestly do not understand the intended purpose of this method.
Based on this observation, I have rewritten the emac_ch390_receive method (using ch390 as an example).
static esp_err_t emac_ch390_receive2(esp_eth_mac_t *mac, uint8_t **buf, uint32_t *length)
{
// ... some other code
if (ready & CH390_PKT_RDY) {
ESP_GOTO_ON_ERROR(ch390_io_memory_read(emac, (uint8_t *) & (rx_header), sizeof(rx_header)),
err, TAG, "peek rx header failed");
*length = (rx_header.length_high << 8) + rx_header.length_low;
if (rx_header.status & RSR_ERR_MASK) {
ch390_drop_frame(emac, *length);
*length = 0;
return ESP_ERR_INVALID_RESPONSE;
} else if (*length > ETH_MAX_PACKET_SIZE) {
/* reset rx memory pointer */
ESP_GOTO_ON_ERROR(ch390_io_register_write(emac, CH390_MPTRCR, MPTRCR_RST_RX), err, TAG, "reset rx pointer failed");
return ESP_ERR_INVALID_RESPONSE;
} else {
*buf=heap_caps_malloc(*length,MALLOC_CAP_DMA);
ESP_GOTO_ON_ERROR(ch390_io_memory_read(emac, *buf, *length), err, TAG, "read rx data failed");
*length -= ETH_CRC_LEN;
*buf=realloc(*buf, *length);
}
} else {
*length = 0;
}
return ESP_OK;
}
// ... some other code
In the emac_ch390_task, I chose to pass the buffer pointer directly to stack_input.
if (status & ISR_PR) {
do {
// if (emac->parent.receive(&emac->parent, emac->rx_buffer, &emac->rx_len) == ESP_OK) {
// if (emac->rx_len == 0) {
// break;
// } else {
// ESP_LOGD(TAG, "receive len=%lu", emac->rx_len);
// /* allocate memory and check whether allocation failed */
// buffer = malloc(emac->rx_len);
// if (buffer == NULL) {
// ESP_LOGE(TAG, "no memory for receive buffer");
// continue;
// }
// /* pass the buffer to stack (e.g. TCP/IP layer) */
// memcpy(buffer, emac->rx_buffer, emac->rx_len);
// emac->eth->stack_input(emac->eth, buffer, emac->rx_len);
// }
// } else {
// ESP_LOGE(TAG, "frame read from module failed");
// break;
// }
if (emac_ch390_receive2(&emac->parent, &buffer, &emac->rx_len) == ESP_OK) {
if (emac->rx_len == 0) {
break;
} else {
ESP_LOGD(TAG, "receive len=%lu", emac->rx_len);
emac->eth->stack_input(emac->eth, buffer, emac->rx_len);
}
} else {
ESP_LOGE(TAG, "frame read from module failed");
break;
}
} while (1);
}
I fully understand that addressing these issues will inevitably require significant modifications to the esp_eth component. I would like to inquire whether you would be open to accepting a pull request of this.
Hi @SergeyKharenko,
Thank you for taking the time to review the esp_eth component in such depth and for providing detailed observations, tests, and proposals. It’s great to see this level of engagement in improving Espressif’s Ethernet drivers. Before diving into your points, could you please confirm which devices you used for testing? Was it DM9051 and CH390, or others?
When I forcibly set the MSS to 90, there were instances where no packets were received for an extended period! Besides, I also used hping for ICMP flood testing and the system was crashed when the packet interval is less than 500 µs.
It would be valuable if you could contribute such tests to
https://github.com/espressif/esp-idf/tree/master/components/esp_eth/test_apps/main
I believe this issue is primarily caused by overly frequent interrupts.
Have you confirmed this using an oscilloscope? For example, W5500 has a feature called Interrupt Assert Wait Time that avoids such behavior. If this is not present in DM9051/CH390, then our driver may indeed require changes to handle it. This is a very useful observation.
On the other hand, our driver does not currently account for scenarios where multiple packets coexist in the EMAC.
Multiple-frame retrieval from EMAC in a single interrupt should already be ensured by
do { } while (emac->packets_remain); in rx_task.
However, the existing polling implementation still has substantial room for optimization.
Could you share more details? Any optimization ideas are highly welcome.
Also, could you elaborate on replacing interrupts with polling? The main advantage of interrupts is low response latency — in polling mode, responsiveness depends on the polling interval. Interrupts also avoid unnecessary context switches when there is no traffic.
Process multiple packets per wake-up...
I fully agree on processing multiple packets per wake-up — current drivers already attempt this, as noted above. If it’s not functioning correctly, that’s a bug we should fix. Any specific proposal would be welcome.
Regarding the idea of a single task serving all interfaces, I’m not yet convinced about its benefits. Performance gains from a single task might be minimal if “multiple packets per wake-up” works as intended — the SPI link is likely the bottleneck. Separate tasks provide flexibility, such as assigning different priorities to each interface, and help keep related logic together for maintainability. I’m also considering renaming the current rx_task to worker_task so it can handle more than just frame reception. Some users want to also monitor transmit events and EMAC states (reliability is important for IoT, though it often conflicts with pure performance). Handling these requires processing various interrupt types outside of ISR context, and creating new tasks for each would be inefficient.
Leverage hardware offloading.
We have a similar internal task to evaluate the performance gains when IP CRCs are computed by the internal EMAC. This will require adding a signaling mechanism for lwIP. The stack_input_info field can be extended to carry such metadata (it currently supports timestamps but can be expanded for other information). I would definitely support this activity — even a proof-of-concept with performance metrics would be very useful.
Firstly, through tracing, I found that the emac->parent.receive method is not used anywhere except within the emac_xxx_task function...
This is mostly retained for historical reasons. I have considered removing it in the past but kept it because it could allow direct MAC reads from higher layers via mac->receive. While not likely used in production, it might be useful in debugging or testing. That said, I personally haven’t used it, so I’m open to discussing its removal.
Use of memcpy from buffer to stack_input in the driver
I agree with you and I also try to avoid extra memcpy as much as possible. However, there was a reason for it. Allocating DMA capable memory at the start will ensure we can always perform the SPI transaction since SPI requires the DMA capable memory. If we used PSRAM, we still would have memory for frame processing in lwIP but we might not have any DMA capable for SPI transaction. However, feel free to optimize the driver as per you suggestion, just keep this possible limitation in mind. And last comment to the proposal is usage of realloc. If I'm not mistaken, it's not guaranteed by standard that realloc never uses memory copy during reallocation. Therefore it would be safer to not reallocating and just passing *length -= ETH_CRC_LEN; to higher layers.
Thanks again for sharing such a thorough review. I see many of your Ethernet related observations useful, and I’m open to discussing and reviewing changes for the points I explicitly marked as needing action. For proposals I haven’t marked — for example, replacing per-interface receive tasks with a single shared task — I’m not planning to make such changes at this stage but I'm still open to discussion. Regarding lwIP optimizations, that's for different thread with different people. I can give you a contact if you are interested in such activity.
@kostaond can you share any usage of mac->receive? I believe it would make sense in my application, but last time I asked Espressif they said to just use callbacks instead.
@owenthewizard what Ethernet MAC do you use? Internal or any of SPI ETH chips?
In general the steps should be the following:
- Don't use IP stack and don't set receive callback function. You can make sure it is not set by calling
esp_eth_update_input_pathandesp_eth_update_input_path_infowith callback argument set toNULL. - Then periodically or on some event defined by you just call:
uint8_t buf[1500];
mac->receive(mac, buf, sizeof(buf));
Note that I haven't actually tested it and it is very specific use case which can cause you more troubles (you can miss frames, etc.). That's the reason why using callbacks is always recommended.
@kostaond Here's my test code:
#include "esp_eth.h"
#include "esp_event.h"
#include "esp_log.h"
#include <inttypes.h>
static const char *TAG = "ethtest";
/** Event handler for Ethernet events */
static void eth_event_handler(void *arg, esp_event_base_t event_base,
int32_t event_id, void *event_data) {
uint8_t mac_addr[6] = {0};
/* we can get the ethernet driver handle from event data */
esp_eth_handle_t eth_handle = *(esp_eth_handle_t *)event_data;
switch (event_id) {
case ETHERNET_EVENT_CONNECTED:
esp_eth_ioctl(eth_handle, ETH_CMD_G_MAC_ADDR, mac_addr);
ESP_LOGI(TAG, "Ethernet Link Up");
ESP_LOGI(TAG, "Ethernet HW Addr %02x:%02x:%02x:%02x:%02x:%02x", mac_addr[0],
mac_addr[1], mac_addr[2], mac_addr[3], mac_addr[4], mac_addr[5]);
break;
case ETHERNET_EVENT_DISCONNECTED:
ESP_LOGI(TAG, "Ethernet Link Down");
break;
case ETHERNET_EVENT_START:
ESP_LOGI(TAG, "Ethernet Started");
break;
case ETHERNET_EVENT_STOP:
ESP_LOGI(TAG, "Ethernet Stopped");
break;
default:
break;
}
}
void app_main(void) {
printf("Hello world!\n");
eth_mac_config_t mac_config = ETH_MAC_DEFAULT_CONFIG();
eth_esp32_emac_config_t esp32_emac_config = ETH_ESP32_EMAC_DEFAULT_CONFIG();
esp32_emac_config.smi_mdc_gpio_num = 23;
esp32_emac_config.smi_mdio_gpio_num = 18;
esp32_emac_config.interface = EMAC_DATA_INTERFACE_RMII;
esp32_emac_config.clock_config.rmii.clock_mode = EMAC_CLK_EXT_IN;
esp32_emac_config.clock_config.rmii.clock_gpio = 0;
esp_eth_mac_t *mac = esp_eth_mac_new_esp32(&esp32_emac_config, &mac_config);
eth_phy_config_t phy_config = ETH_PHY_DEFAULT_CONFIG();
phy_config.phy_addr = 0;
phy_config.reset_gpio_num = 12;
esp_eth_phy_t *phy = esp_eth_phy_new_rtl8201(&phy_config);
esp_eth_config_t config = ETH_DEFAULT_CONFIG(mac, phy);
esp_eth_handle_t eth_handle = NULL;
ESP_ERROR_CHECK(esp_eth_driver_install(&config, ð_handle));
esp_event_loop_create_default();
esp_event_handler_register(ETH_EVENT, ESP_EVENT_ANY_ID, ð_event_handler,
NULL);
esp_eth_start(eth_handle);
uint8_t buf[1522];
while (1) {
uint32_t len = 1522;
int ret = mac->receive(mac, buf, &len);
ESP_LOGW(TAG, "receive status: %d", ret);
ESP_LOGW(TAG, "buf len after allocation: %ld", len);
}
}
This is a LilyGo T-ETH Lite, I've also tried on a WT32-ETH01 (LAN8720).
@owenthewizard does it work as you expected? And I see, I made a mistake in my mac->receive call example above 😄 I was really in rush. It's nice to see you fixed them both!
@kostaond I'm sorry, I should have included the output. I'm not at my computer right now, but it just loops endlessly with buffer len 0, i.e. nothing received. I should probably do another test sending a bunch of frames to the LAN interface to ensure there's data and maybe a small delay. I can test that later today.
Thank you for your help, let me know if there's somewhere else you'd like to move this, since I'm hijacking this issue 😬 .