sddf icon indicating copy to clipboard operation
sddf copied to clipboard

clang ethernet performance issue on star64

Open KurtWu10 opened this issue 2 months ago • 0 comments

There seems to be a performance issue with clang 19.1.7 (or clang 18) compared with the newer clang 20.1.8 (or clang 21):

with the following preliminary patch on top of the current main branch, for the echo server UDP benchmark on star64, only the newer clang delivers the expected performance:

  • clang 19:

    Requested_Throughput,Receive_Throughput,Send_Throughput,Packet_Size,Minimum_RTT,Average_RTT,Maximum_RTT,Stdev_RTT,Median_RTT,Bad_Packets,Idle_Cycles,Total_Cycles
    970000000,821700381,969997878,1472,2396,2706,2924,111.31,2746,0,0,0
    
  • clang 20:

    Requested_Throughput,Receive_Throughput,Send_Throughput,Packet_Size,Minimum_RTT,Average_RTT,Maximum_RTT,Stdev_RTT,Median_RTT,Bad_Packets,Idle_Cycles,Total_Cycles
    970000000,954461588,969995994,1472,6156,6712,7154,238.39,6713,0,0,0
    

The issue does not occur without the patch.

The performance degradation is likely caused by an inefficiency in clang-19 when compiling lwip (in particular, network/ipstacks/lwip/src/core/memp.c), because high throughput is restored when lwip is compiled using clang-20 while the remaining components use clang-19.

A temporary fix is to add the -msmall-data-limit=0 compilation flag, which defaults to 8 before clang-20.


patch:

diff --git a/drivers/network/dwmac-5.10a/ethernet.c b/drivers/network/dwmac-5.10a/ethernet.c
index 1fe5e884..4f1182f4 100644
--- a/drivers/network/dwmac-5.10a/ethernet.c
+++ b/drivers/network/dwmac-5.10a/ethernet.c
@@ -350,6 +350,12 @@ static void eth_init()
     *DMA_REG(DMA_CH0_RX_CONTROL) &= ~(DMA_CH0_RX_RBSZ_MASK);
     *DMA_REG(DMA_CH0_RX_CONTROL) |= (MAX_RX_FRAME_SZ << DMA_CH0_RX_RBSZ_POS);
 
+#if defined(CONFIG_PLAT_STAR64)
+    *DMA_REG(0x1100) |= (1 << 16);
+    *DMA_REG(DMA_CH0_TX_CONTROL) |= (2 << 16); // set PBL
+    *DMA_REG(DMA_CH0_RX_CONTROL) |= (8 << 16);
+#endif
+
     // Program the descriptor length. This is to tell the device that when
     // we reach the base addr + count, we should then wrap back around to
     // the base.

KurtWu10 avatar Oct 22 '25 13:10 KurtWu10