FreeRTOS-Plus-TCP icon indicating copy to clipboard operation
FreeRTOS-Plus-TCP copied to clipboard

Unified STM32 Network Interface

Open HTRamsey opened this issue 2 years ago • 72 comments

New Unified STM32 driver for F4, F7, H5, H7.

Description

Implementation of new STM32 drivers. Does not support f2.

Test Steps

Checklist:

  • [ ] I have tested my changes. No regression in existing tests.
  • [ ] I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Related Issue

#766

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

HTRamsey avatar Mar 21 '23 03:03 HTRamsey

@htibosch @AniruddhaKanhere Looks like this one NetworkInterface should work for all of Fxx & Hxx. The biggest difference between the HAL drivers are the usage of a few register macros that are named differently, but we can easily abstract that out. I've tested on a F767ZI and can test a H755ZI-Q, but don't have a way to test H5 at the moment.

HTRamsey avatar Mar 23 '23 20:03 HTRamsey

@holden-zenith wrote:

I've tested on a F767ZI and can test a H755ZI-Q, but don't have a way to test H5 at the moment.

If I'm not mistaken, I have an H747, F407, and F746 for testing.

So e.g. when testing F407, I use the latest HAL drivers for that part?

htibosch avatar Mar 24 '23 04:03 htibosch

@htibosch yes, last time I checked the latest F4 & F7 Eth drivers were 100% the same, so it should work the same. Before you test, I just noticed the call to HAL_ETH_ReleaseTxPacket needs to be moved to EMAC_IF_TX_EVENT because it calls the non interrupt safe version of vReleaseNetworkBufferAndDescriptor. I was using vNetworkBufferReleaseFromISR in the HAL_ETH_TxFreeCallback, but I guess it needs to be handled in the task to be compatible with both buffer allocation methods.

HTRamsey avatar Mar 24 '23 04:03 HTRamsey

but I guess it needs to be handled in the task to be compatible with both buffer allocation methods.

As you may have read, I also prefer to handle network buffers from a normal task, rather than from an interrupt.

I just came across this test:

    bool isInterrupt()
    {
        return (SCB->ICSR & SCB_ICSR_VECTACTIVE_Msk) != 0 ;
    }

which could be used to choose the correct network buffer function.

last time I checked the latest F4 & F7 Eth drivers were 100% the same

Doesn't F7 have multiple TX buffers each with their own hardware priority? I never used that feature, but I did have to disable it explicitly.

Would you mind to send me an email so I can ask you little things now and then? hein [at] htibosch [dot] net

Thanks,

htibosch avatar Mar 24 '23 05:03 htibosch

HI @holden-zenith thanks for creating the change. I wanted to get a little bit of context. Please help in explaining the thought behind coming up with this new driver. I see that the older H and F series portable layers are moved inside legacy and the new driver has an integrated NetworkInterface.c file. Is the idea to have a single file to handle all the multiple stm32 SKUs.

shubnil avatar Mar 31 '23 09:03 shubnil

@shubnil Historically the STM32 HAL Eth drivers have been awful, which is part of the reason why edited HAL drivers were provided with the NetworkInterface.c. The newer HAL drivers are 'better' and all STM32's from F4, F7, H5, H7 essentially use the same driver now. So with this, edited HAL drivers are no longer necessary, it will be much easier to update to accommodate for future HAL updates, only one NetworkInterface is needed, and I am getting about 20mbps higher on iperf. I left the legacy drivers for now since anything prior to F4 won't be getting the HAL update, so technically I suppose only the Fxx driver would be necessary to leave.

HTRamsey avatar Mar 31 '23 14:03 HTRamsey

@holden-zenith , do you have a new version of NetworkInterface.c to test?

htibosch avatar Apr 01 '23 09:04 htibosch

@htibosch I'll probably push it in the next few days. I've been able to max out the bandwidth at 95mbps but it's not consistent and depends on compilation optimization. So I need to spend some time looking through the disassembly before I push it.

HTRamsey avatar Apr 02 '23 03:04 HTRamsey

@holden-zenith What's the status of this? I have a Nucleo-H563ZI board and would like to use FreeRTOS+TCP but the current port implementation does not support H5 series.

bjsowa avatar Jul 05 '23 14:07 bjsowa

@bjsowa Actually it would be great if you could test it. I have used it successfully on an F7 but there is potential concern with how it compares to the older driver in terms of performance, at least when tested with an F4. I do not have an H5 at the moment to try it.

HTRamsey avatar Jul 05 '23 19:07 HTRamsey

@holden-zenith I made the driver work for my nucleo board (at least DHCP and ping replies seem to work). I posted the modified NetworkInterface.c with some comments here: holden-zenith/FreeRTOS-Plus-TCP#1

I also ported it to work for version 4.0.0 of FreeRTOS-Plus-TCP (as I want IPv6 support) by mimicking the changes made in #602 for the current driver.

bjsowa avatar Jul 06 '23 00:07 bjsowa

On bjsowa/FreeRTOS-Plus-TCP#1 I integrated the new driver with the CMake build system. You can use this for reference.

bjsowa avatar Jul 06 '23 12:07 bjsowa

@bjsowa oh nice that's great. I am working on testing an H7 too so all of F4, F7, H5, & H7 will have been tested. Seems to be at least usable on all of them for the time being. In theory it should perform at least as well as the last driver but iperf apparently has had mixed results so far.

HTRamsey avatar Jul 06 '23 13:07 HTRamsey

The earlier drivers for STM32Fxx and STM32Hxx used a copy of ST's HAL driver. The driver was optimised for efficiency/speed, and support for zero-copy was added.

@holden-zenith created a unified driver that is based on today's HAL drivers. The driver files must be obtained from ST.

I tried the new driver on STM32F4 and STM32F7, and found that iperf3 shows lower speeds, both TX and RX. I wonder what the influence is of the extra mutex?

I've got no testing equipment where I am now. In August I can do a real-time analyses of both drivers.

htibosch avatar Jul 06 '23 15:07 htibosch

@holden-zenith I made the driver work for my nucleo board (at least DHCP and ping replies seem to work). I posted the modified NetworkInterface.c with some comments here: holden-zenith#1

I also ported it to work for version 4.0.0 of FreeRTOS-Plus-TCP (as I want IPv6 support) by mimicking the changes made in #602 for the current driver.

@bjsowa Thanks for trying out the release candidate versions of 4.0.0. Did you face any difficulties or issues with IPv6? Also curious to know if you use the multiple endpoint feature? PS: We will be soon doing a formal release (targeting mid August). The stack is currently undergoing security review and penetration testing.

amazonKamath avatar Jul 18 '23 12:07 amazonKamath

@htibosch Here are the ping results of the latest version. It seems to have much better latency, at least for me. Old Driver: Screenshot from 2023-08-12 00-22-29 New Driver: Screenshot from 2023-08-12 00-21-59

HTRamsey avatar Aug 12 '23 04:08 HTRamsey

@holden-zenith wrote:

Here are the ping results of the latest version. It seems to have much better latency, at least for me.

That looks very promising. Dit you already commit and push the changes? Do you want me to test it on a particular platform?

htibosch avatar Aug 14 '23 09:08 htibosch

I've pushed what I used in those tests. You can hold off on testing until I've invested a bit more time into an in-depth timing review and finished implementation, but at some point it would be interesting to see your latency results on the F4 similar to the tests I posted, and maybe on the H5 too @bjsowa.

I'm still not sure about your iperf results, but I know that the configuration values I selected would drastically alter the results so it took quite a bit of tweaking to get upwards of 90mbps. Perhaps checking some of the metrics/stats would be informative, like the idle time and such.

HTRamsey avatar Aug 16 '23 17:08 HTRamsey

@holden-zenith I've tried your updated driver on H5 and so far it seems to work. I was not able to use mDNS though but I assume this is yet to be implemented.

As I'm using cmake I had to add the new driver to interface selection (holden-zenith/FreeRTOS-Plus-TCP#2). Also to avoid compilation errors, I needed to change:

#include "stm32h5xx_hal_eth.h"

to

#include "stm32h5xx_hal.h"

I tried to test the throughput with Iperf but when I ported the existing iperf3 implementation for FreeRTOS-Plus-TCP 4.0.0 version, it resulted in hard faults and I gave up on trying to debug it. If you have a working iperf implementation I can test the performance.

bjsowa avatar Aug 18 '23 13:08 bjsowa

I am trying to get your implementation running on a simple F746 Nucleo Board but it hangs as soon a TCP package (eg. ping request) arrives. I have uploaded the whole project: https://github.com/mhorsche/F746_NucleoTemplate/tree/freertos

The UART output of my sample project:

#### Template v0.0.1: Aug 18 2023 14:55:49 ####
(DevID: 0x449 / RevID: 0x1001 / 1024 kByte)

prvIPTask started
PHY ID 7C130

+----+---------------+------+-------+-----------------+-------+
| no | Task name     | Prio | Stack |        Time(uS) |  Perc |
+----+---------------+------+-------+-----------------+-------+
|  0 | LED           |   17 |   842 |               0 |  0.00 |
|  1 | LOG           |    8 |   639 |               0 |  0.00 |
|  2 | IDLE          |    0 |   119 |               0 |  0.00 |
|  3 | IP-task       |   54 |   141 |            8095 | 96.37 |
|  4 | TIMER         |   55 |   231 |              15 |  0.18 |
+----+---------------+------+-------+-----------------+-------+
| Heap: min  34296 / free  34512 / total 202008 / stack  4096 |
| Network Buffers: min  64 / free  64 / total  64             |
+-------------------------------------------------------------+

xPhyReset: phyBMCR_RESET 0 ready
+TCP: advertise: 01E1 config 3100
Autonego ready: 00000004: full duplex 100 mbit high status
IP Address: 192.168.178.188
Subnet Mask: 255.255.255.0
Gateway IP Address: 192.168.178.1
DNS server IP Address: 208.67.222.222
ipARP_REQUEST from c0a8b264ip to c0a8b2c9ip end-point c0a8b2bcip
ipARP_REQUEST from c0a8b261ip to c0a8b2caip end-point c0a8b2bcip
ipARP_REQUEST from c0a8b264ip to c0a8b2c8ip end-point c0a8b2bcip
ipARP_REQUEST from c0a8b228ip to c0a8b277ip end-point c0a8b2bcip
ipARP_REQUEST from c0a8b264ip to c0a8b2c9ip end-point c0a8b2bcip
ipARP_REPLY from c0a8b202ip to c0a8b2bcip end-point c0a8b2bcip

Would be very thankful for any help!

mhorsche avatar Aug 18 '23 15:08 mhorsche

@bjsowa wrote:

I was not able to use mDNS though but I assume this is yet to be implemented.

Have you made sure that the multicast addresses are accepted?

MAC 33-33-00-00-00-fb
IPv6  ff02::fb
MAC 01-00-5e-00-00-fb
IPv4  224.0.0.251

htibosch avatar Aug 19 '23 09:08 htibosch

@holden-zenith Can you publish the PR and share the validation details. This will help us start reviewing and merge the changes.

shubnil avatar Aug 21 '23 06:08 shubnil

@mhorsche the stm drivers still have issues, one being that ETH_RX_BUF_SIZE is used instead of heth->Init.RxBuffLen which can cause problems during HAL_ETH_RxAllocateCallback if these two values are not the same.

HTRamsey avatar Aug 28 '23 01:08 HTRamsey

@mhorsche the stm drivers still have issues, one being that ETH_RX_BUF_SIZE is used instead of heth->Init.RxBuffLen which can cause problems during HAL_ETH_RxAllocateCallback if these two values are not the same.

Thanks for your reply. Sadly this did not solve the issue yet. I have changed both EMAC_DATA_BUFFER_SIZE and ETH_RX_BUF_SIZE to be the same size of 1536.

The program fails during vReleaseNetworkBuffer() trying to free pxIterator referencing to itself (using heap_4.c). image

If I change the memory management to heap_5.c (which I also used with the old TCP+ stack) the program fails at the same function or during heapBLOCK_IS_ALLOCATED. image

I am not 100 percent sure if all the MPU config and HeapReagion definition is correct but as it worked with the old stack it should be fine? I have also disable both I- and D-Cache to avoid any caching problems.

void vHeapInit(void)
{
  uint32_t ulStackSize = (uint32_t) & (_Min_Stack_Size);

  pucHeapStart = (uint8_t *)((((uint32_t)&__bss_end__) + 7) & ~0x07ul);

  ulHeapSize = (uint32_t)(&_estack - &__bss_end__);
  ulHeapSize &= ~0x07ul;
  ulHeapSize -= ulStackSize;

  HeapRegion_t xHeapRegions[] = {
      {(unsigned char *)pucHeapStart, ulHeapSize},
      {NULL, 0}};

  vPortDefineHeapRegions(xHeapRegions);
}

mhorsche avatar Aug 28 '23 14:08 mhorsche

I saw you had adapted the linker script I had posted which is for BufferAllocation1, can you try with that instead of Alloc2 or instead use your original linker script? I personally use a modified version of BufferAllocation1.

HTRamsey avatar Aug 28 '23 14:08 HTRamsey

I saw you had adapted the linker script I had posted which is for BufferAllocation1, can you try with that instead of Alloc2 or instead use your original linker script? I personally use a modified version of BufferAllocation1.

Works like charm now, thank you very much. I can confirm your ping flood results, compiled with GCC flags -Og -gdwarf-4:

$ sudo ping -f 192.168.178.188 -w 10
PING 192.168.178.188 (192.168.178.188) 56(84) bytes of data.
.
--- 192.168.178.188 ping statistics ---
38582 packets transmitted, 38581 received, 0.00259188% packet loss, time 10000ms
rtt min/avg/max/mdev = 0.149/0.177/1.677/0.022 ms, ipg/ewma 0.259/0.168 ms

mhorsche avatar Aug 28 '23 15:08 mhorsche

@holden-zenith I had occasionally encountered hard faults but I think I fixed the issue in holden-zenith/FreeRTOS-Plus-TCP#3. Could you take a look? Also, could you merge holden-zenith/FreeRTOS-Plus-TCP#2 ?

Have you made sure that the multicast addresses are accepted?

MAC 33-33-00-00-00-fb
IPv6  ff02::fb
MAC 01-00-5e-00-00-fb
IPv4  224.0.0.251

@htibosch The MAC addresses are filtered on driver level. There is currently no API for accepting MAC addresses.

bjsowa avatar Aug 30 '23 11:08 bjsowa

@bjsowa I will add that to my next commit which includes a lot of error handling recovery and such. Then the last major thing is to try and accommodate for various config macros.

I have a heavily modified iperf file but I haven't tried it yet on 4.0.0. I'll pass it along afterwards. In the mean time could you do a ping flood test and post the results?

HTRamsey avatar Sep 06 '23 14:09 HTRamsey

@bjsowa I will add that to my next commit which includes a lot of error handling recovery and such. Then the last major thing is to try and accommodate for various config macros.

I have a heavily modified iperf file but I haven't tried it yet on 4.0.0. I'll pass it along afterwards. In the mean time could you do a ping flood test and post the results?

I was able to modify iperf to make it usable on 4.0.0. I could achieve ~25-30 Mbps in TCP mode. I could not make UDP test to work.

Here's a result of ping flood:

$ sudo ping -f 10.42.0.27 -w 10
PING 10.42.0.27 (10.42.0.27) 56(84) bytes of data.
.
--- 10.42.0.27 ping statistics ---
67430 packets transmitted, 67429 received, 0.00148302% packet loss, time 10000ms
rtt min/avg/max/mdev = 0.051/0.132/6.674/0.057 ms, ipg/ewma 0.148/0.130 ms

I occasionally experience hard faults by dereferencing NULL on this line: https://github.com/holden-zenith/FreeRTOS-Plus-TCP/blob/dev-stm32/source/portable/NetworkInterface/STM32/NetworkInterface.c#L585

This happens when I have active TCP connection (server on STM, client on my computer) and I disconnect my computer from the network, the TCP socket on STM is closed and then I reconnect my computer to the network.

bjsowa avatar Sep 07 '23 07:09 bjsowa

Screenshot from 2023-10-07 07-20-53 Now that's a lot better

HTRamsey avatar Oct 07 '23 11:10 HTRamsey