STM32CubeH7 icon indicating copy to clipboard operation
STM32CubeH7 copied to clipboard

Ethernet: avoid descriptor leak with enabled timestamping

Open alsrgv opened this issue 3 years ago • 5 comments

Describe the set-up

  • The board (either ST RPN reference or your custom board): NUCLEO-H745ZI-Q
  • IDE or at least the compiler and its version: arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10-2020-q4-major) 10.2.1 20201103 (release)

Describe the bug

Ethernet HAL configures interrupts to be received upon filling every RX descriptor. When timestamping is enabled, application needs to process both normal and context descriptors, but there is a race condition in HAL_ETH_IsRxDataAvailable() that might miss context descriptor, effectively leaking it because they won't be accounted for in HAL_ETH_BuildRxDescriptors().

How To Reproduce

  1. Indicate the global behavior of your application project

We're building a PTP master / slave implementation in Zephyr on STM32 that uses hardware timestamping. Majority of the implementation is represented by this commit: https://github.com/maka-autonomous-robotic-systems/zephyr/commit/c25c72f91996f69dcee737e405272edcf951f560

  1. The modules that you suspect to be the cause of the problem (Driver, BSP, MW ...)

drivers/src/stm32h7xx_hal_eth.c

  1. The use case that generates the problem

Receive timestamping is enabled, hardware is filling RX normal and context descriptors and issues interrupts. When driver code is awoken and runs HAL_ETH_IsRxDataAvailable(), it might either see just normal RX descriptor available or both normal and context RX descriptors, even if both were supposed to be available. In former case, context RX descriptor is populated later, but since it's not accounted by HAL_ETH_IsRxDataAvailable() it is never released back and is thus leaked.

  1. How we can reproduce the problem

One option is to use forked Zephyr code from this commit: https://github.com/maka-autonomous-robotic-systems/zephyr/commit/c25c72f91996f69dcee737e405272edcf951f560 and run gptp sample on STM32H7 board for a while. You'd want to revert this change (https://github.com/maka-autonomous-robotic-systems/zephyr/commit/1294620e92ef8ab89d03c7dc4978e69122c32e69) to switch back to unmodified STM32H7 HAL.

Another option is to make a custom example that enables timestamping similar to that commit, and observe descriptors leaked nondeterministically.

Additional context

PR with a proposed solution has been submitted to Zephyr Project's fork of STM32 HAL: https://github.com/zephyrproject-rtos/hal_stm32/pull/116

Please let me know if additional information would be helpful or if you have any feedback.

alsrgv avatar Nov 05 '21 05:11 alsrgv

Hi @ALABSTM, any update on this issue?

We'd like to get at least an acknowledgement of the issue, so the fix proposed in zephyrproject-rtos/hal_stm32#116 can be merged.

alsrgv avatar Nov 23 '21 19:11 alsrgv

Hi @alsrgv,

Please excuse this delayed reply. Our development teams said this issue is covered by the new ETH driver. This new ETH driver will be published soon. Unfortunately, we cannot share a date for the moment.

Thank you for your comprehension and thank you again for having reported this point.

With regards,

ALABSTM avatar Nov 24 '21 11:11 ALABSTM

Hi @alsrgv,

I hope you are fine. We just made available a fully reworked ETH HAL driver via pull-request stm32h7xx_hal_driver#16. The point you raised should have been addressed. You can use this new driver for your applications. Please feel free to provide your feedback here.

Please note also that this new ETH HAL driver breaks the compatibility with the previous one. We are looking forward to reading your feedback.

With regards,

ALABSTM avatar Dec 08 '21 11:12 ALABSTM

@ALABSTM, thanks for fixing this up! Do you know if anybody is working on porting Zephyr to this new API?

alsrgv avatar Dec 17 '21 07:12 alsrgv

Hi @alsrgv,

The point has already been reported at the Zephyr OS project-level. Here is the link.

With regards,

ALABSTM avatar Dec 22 '21 10:12 ALABSTM

Hi @alsrgv,

The new HAL ETH driver has been made available in version 1.10.0 of this firmware, which has been published since a while. Please allow me to close this issue. Thank you again for having reported this point.

With regards,

ALABSTM avatar Apr 10 '23 11:04 ALABSTM