nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

driver/timers: ClockDevice, a new timer driver abstraction for NuttX.

Open Fix-Point opened this issue 1 month ago • 17 comments

Summary

This PR proposed ClockDevice, a new timer driver abstraction for NuttX. The new CLOCKDEVICE timer hardware abstraction delivers:

  • Functional correctness: Thread-safe and overflow-free
  • High performance: Up to 3x faster on ARMv7A platforms
  • Theoretically optimal timing precision: Uses hardware cycle counts as the time unit.
  • Time-unit independent interfaces: Decouples timer drivers from OS time subsystems
  • Minimalist driver implementation: Reduces driver code size by nearly 70%.

Situation

Let’s look at the current state of the timing subsystem in NuttX. image

The NuttX timing subsystem consists of four layers:

  • Hardware Timer Drivers: Includes implementations of various hardware timer drivers.
  • Timer Driver Abstraction: Such as Oneshot and Timer, which provide timer hardware abstraction.
  • OS Timer Interfaces: Timer(up_timer_*) and Alarm(up_alarm_*), offering relative and absolute timer interfaces.
  • OS Timer Abstraction: The wdog module manages software timers and provides a unified timer API to upper layers.

Here we focus on the oneshot timer driver abstraction.

So what are the current problem?

First, there are problems related to time conversion, primarily overflow and loss of precision.

Overflow: NuttX performs conversions between timespec and tick values, which can lead to overflow. For example, converting a timespec into ticks could cause multiplication to overflow, resulting in incorrect values.

Loss of Precision: The current ARM Generic Timer uses cycle_per_tick to simplify time calculations. However, this approach only works accurately if the frequency is divisible by USEC_PER_TICK. If not, significant precision loss can occur. For instance, on Cortex-R82, the timer frequency might be 2.73MHz, a value that often doesn’t divide cleanly, leading to timing inaccuracies.

Given how error-prone time conversion can be, why not provide a unified, reusable, and correct implementation for time conversion?

The second major issue is performance.

Unnecessary current time acquring: One contributing factor is unnecessary time acquring overhead. When setting a timer using a relative timer interface, even on hardware that natively supports absolute timing, the system incurs extra CPU cycles (around 60 cycles on x86) to read the current time.

Division Latency: Another source of overhead is division operations. Calculating the current tick or timespec value requires division. Due to the complex implemetation of division in hardware, it has significantly higher latency compared to other operations. This not only slows down performance but also introduces timing jitter.

Moreover, the driver is also responsible for managing callback functions and handling multi-core concurrency. Improper handling can easily lead to thread-safety issues.

ClockDevice

To address these issues, we propose ClockDevice, a better timer driver abstraction designed to replace the original oneshot API. It aims to achieve the following design goals:

  • Functional correctness (no overflow in time conversion)
  • Optimized performance
  • Simplified driver implementation
  • Expressive interfaces

ClockDevice implements two methods to accelerate time conversion:

Invariant-divisor Divsion (INVDIV) : used to convert clock counts to seconds or ticks. For all $n$ and divisor $d$, we can find $m$ and $s$, ensures $\lfloor \frac {n \times m}{2^{s}} \rfloor = \lfloor \frac{n}{d} \rfloor$ Benefit: Transform the division to 1x unsigned multiplication high (UMULH), 1x subtraction, 1x addition and 1x logical right-shifting (LShR).

Multiply-Shift Approximate Division: Used to convert delta counts into nanoseconds or ticks. This method, adopted in the Linux kernel for time conversion, trades off slight precision (a few nanoseconds) for better performance. However, due to potential multiplication overflow, it is only suitable for relative time conversions. The previous optimization ensures the accurate division but costs 6-9 CPU cycles. This method only needs 1x unsigned multiplication and 1x LShR, which usually costs 4 CPU cycles. Benefit: It exchanges the time precision for better performance.

Impact

These code commits affect the timing subsystem, as well as the following architecture:

  • arm-v7a/arm-v7r/arm-v8r
  • arm-v8a
  • riscv
  • sim
  • tricore
  • intel64

Testing

To evaluate the performance improvements, we conducted tests on three platforms: qemu-inte64/KVM, imx8qm-mek/arm64, and qemu-armv7a. We measured the CPU cycle overhead for:

  • Reading the current time
  • Setting a timer
  • Handling a timer callback

Each operation was executed 10 million times, and the results were averaged.

The results demonstrated significant performance improvments.

On qemu-armv7a, the software division is the performance bottle-neck. ClockDevice brings significant performance improvements:

API speed-up
clock_gettime 2.56x
clock_systime_ticks 3.00x
wd_start 1.67x
wd_start_cancel 1.43x
wd_expiration 2.36x
image

On qemu-inte64/KVM, ClockDevice achieved up to 1.42x performance improved. Especially, for clock_gettime API, NuttX with ClockDevice improvements had 1.31x better performance than Linux Kernel 6.8.0-51. image

On imx8qm-mek/arm64, ClockDevice achieved up to 1.68x performance improvement. Note on the early ARM64 platform (Cortex A-53), the INVDIV optimization can not work well since the hardware division instruction UDIV costs less CPU cycles on average than the INVDIV. image

Plan

Due to the need for extensive code modifications, some work remains unfinished. The following is a list of the planned tasks.

  • [x] 1. Simplify the timer drivers and add SMP initialization.
  • [x] 2. Remove the callback and arg from the oneshot API.
  • [x] 3. Remove tick-based oneshot API.
  • [x] 4. Add new count-based oneshot API (ClockDevice).
  • [x] 5. Introduce optimized fast-path for count-based oneshot API.
  • [x] 6. Reimplement the common-use timer drivers with count-based oneshot API.
  • [x] 7. Add document for ClockDevice.
  • [WIP] 8. Inlining arch_alarm for performance (3% ~ 5% less execution time for clock_gettime, wd_start and wd_expiration).

Architectures support:

  • [x] arm-v7a/v7r/v8r generic timer
  • [x] goldfish
  • [x] arm-v8a generic timer
  • [x] sim
  • [x] intel64 TSC-deadline timer
  • [x] tricore systimer
  • [x] risc-v mtime
  • [WIP] risc-v bl602 timer
  • [WIP] risc-v esp32c3 timer

Fix-Point avatar Nov 04 '25 03:11 Fix-Point

Hi @Fix-Point , bcm2711_timer.c has been removed but is still present in Make.defs and CMakefile.

https://github.com/Fix-Point/nuttx/blob/2dc2a345949e6f50830de48422b74a1fec92d473/arch/arm64/src/bcm2711/CMakeLists.txt#L24

https://github.com/Fix-Point/nuttx/blob/2dc2a345949e6f50830de48422b74a1fec92d473/arch/arm64/src/bcm2711/Make.defs#L29

even with the file imx9_timer.c‎ here https://github.com/Fix-Point/nuttx/blob/2dc2a345949e6f50830de48422b74a1fec92d473/arch/arm64/src/imx9/Make.defs#L31

simbit18 avatar Nov 04 '25 11:11 simbit18

Hi @Fix-Point, a few things:

  1. Great work! I know there are a lot of people who are excited for this change, including myself. I plan to re-watch your workshop presentation soon to get a better understanding
  2. As I think was mentioned, please include some extensive documentation about this new feature. How does it work, when should it be used, does it make any guarantees (big one is to not timeout before the actual duration has passed), etc. You can even link to your talk from the docs. It would be really helpful not only to new users but also for developers, as I see this is replacing a lot of other timer implementations (?)
  3. This PR is 47 commits long right now (not sure if that's intentional). I think we should split some of this up into multiple PRs if possible (i.e. one to add the feature, another for replacing existing timer usage with the new feature, etc.).

linguini1 avatar Nov 04 '25 17:11 linguini1

@Fix-Point nice work! Please create a Documentation/ explaining the idea of the ClockDevice driver abstraction, its APIs, simple usage example and the info you provided here in the Summary with those graphics

Understood, I will add relevant documentation in next few weeks.

Fix-Point avatar Nov 05 '25 02:11 Fix-Point

@Fix-Point nice work! Please create a Documentation/ explaining the idea of the ClockDevice driver abstraction, its APIs, simple usage example and the info you provided here in the Summary with those graphics

Understood, I will add relevant documentation in next few weeks.

Right, let move it to draft to avoid someone merging it accidentally.

acassis avatar Nov 05 '25 11:11 acassis

Hello all,

Maybe it's best if I wait for the documentation, but still. There is already work done in drivers/clk, which defines clk_s and clk_ops_s, though I do not know anything more about them. This ClockDevice seems to only talk about timers. Is there plan to also cover the work of drivers/clk? Or can we make it cover drivers/clk? It's a bit confusing using ClockDevice name without relating to drivers/clk.

Thanks.

tinnedkarma avatar Nov 05 '25 22:11 tinnedkarma

Hello all,

Maybe it's best if I wait for the documentation, but still. There is already work done in drivers/clk, which defines clk_s and clk_ops_s, though I do not know anything more about them. This ClockDevice seems to only talk about timers. Is there plan to also cover the work of drivers/clk? Or can we make it cover drivers/clk? It's a bit confusing using ClockDevice name without relating to drivers/clk.

Thanks.

it's different subsystem:

  • drivers/clk is for changing your peripheral(uart, spi, even timer) working clock frequency/phase etc.
  • drivers/timer is for getting the current timestamp and trigger callback on the specific interval.

since clk subsystem design principle is same as Linux counterpart, so you can learn from Linux doc too: https://docs.kernel.org/driver-api/clk.html

xiaoxiang781216 avatar Nov 06 '25 01:11 xiaoxiang781216

Please hold this PR and do not merge it easily. @Fix-Point Could you split this patch into multiple PRs:

  1. Split the ONESHOT_COUNT core implementation
  2. Split the division optimization
  3. Split the implementation for each architecture
  4. Split the defconfig changes

Before the defconfig update in phase 4, the new clock framework should not be enabled in the system. The ONESHOT_COUNT framework needs time to prove itself; each developer should decide for themselves whether to enable this feature.

anchao avatar Nov 06 '25 02:11 anchao

@Fix-Point nice work! Please create a Documentation/ explaining the idea of the ClockDevice driver abstraction, its APIs, simple usage example and the info you provided here in the Summary with those graphics

Understood, I will add relevant documentation in next few weeks.

Right, let move it to draft to avoid someone merging it accidentally.

The document has been added. Please check if it is appropriate or needs further refinement.

Fix-Point avatar Nov 10 '25 07:11 Fix-Point

Please hold this PR and do not merge it easily. @Fix-Point Could you split this patch into multiple PRs:

  1. Split the ONESHOT_COUNT core implementation
  2. Split the division optimization
  3. Split the implementation for each architecture
  4. Split the defconfig changes

Before the defconfig update in phase 4, the new clock framework should not be enabled in the system. The ONESHOT_COUNT framework needs time to prove itself; each developer should decide for themselves whether to enable this feature.

But we had to revert the tick-based interfaces, which would cause performance degradation on these platforms and incur the overhead of re-verification. Besides, I think the new ClockDevice drivers are more reliable than the original, since they have fixed many timer driver bugs. E.g, the tricore systimer will miss interrupt if the next count to be set has already expired.

Fix-Point avatar Nov 10 '25 07:11 Fix-Point

Please hold this PR and do not merge it easily. @Fix-Point Could you split this patch into multiple PRs:

  1. Split the ONESHOT_COUNT core implementation
  2. Split the division optimization
  3. Split the implementation for each architecture
  4. Split the defconfig changes

Before the defconfig update in phase 4, the new clock framework should not be enabled in the system. The ONESHOT_COUNT framework needs time to prove itself; each developer should decide for themselves whether to enable this feature.

But we had to revert the tick-based interfaces, which would cause performance degradation on these platforms and incur the overhead of re-verification. Besides, I think the new ClockDevice drivers are more reliable than the original, since they have fixed many timer driver bugs. E.g, the tricore systimer will miss interrupt if the next count to be set has already expired.

The ticks loss issue can be fixed by the counter register, and it shouldn't be a serious problem unless more than 2 toggle cycles are involved. The new framework needs further evaluation; although it's more accurate, I'm concerned it might increase CPU load, especially on 32-bit systems

anchao avatar Nov 10 '25 08:11 anchao

@Fix-Point please fix ci issue:

../nuttx/tools/checkpatch.sh -c -u -m -g 6088f6216b8fe8a7f2cc76742403f7147bb9f823..HEAD
Missing git commit message
Commit subject missing colon (e.g. 'subsystem: msg')
Missing git commit message
Commit subject missing colon (e.g. 'subsystem: msg')
Missing git commit message
Commit subject missing colon (e.g. 'subsystem: msg')
Missing git commit message
Missing git commit message
Missing git commit message
Missing git commit message
Used config files:
    1: .codespellrc
Some checks failed. For contributing guidelines, see:

reading sources... [100%] substitutions


Warning, treated as error:
/home/runner/work/nuttx/nuttx/Documentation/components/drivers/timers/oneshot/index.rst:14:Definition list ends without a blank line; unexpected unindent.
make: *** [Makefile:48: html] Error 2
Error: Process completed with exit code 2.

xiaoxiang781216 avatar Nov 10 '25 13:11 xiaoxiang781216

Please hold this PR and do not merge it easily. @Fix-Point Could you split this patch into multiple PRs:

  1. Split the ONESHOT_COUNT core implementation
  2. Split the division optimization
  3. Split the implementation for each architecture
  4. Split the defconfig changes

Before the defconfig update in phase 4, the new clock framework should not be enabled in the system. The ONESHOT_COUNT framework needs time to prove itself; each developer should decide for themselves whether to enable this feature.

But we had to revert the tick-based interfaces, which would cause performance degradation on these platforms and incur the overhead of re-verification. Besides, I think the new ClockDevice drivers are more reliable than the original, since they have fixed many timer driver bugs. E.g, the tricore systimer will miss interrupt if the next count to be set has already expired.

The ticks loss issue can be fixed by the counter register, and it shouldn't be a serious problem unless more than 2 toggle cycles are involved. The new framework needs further evaluation; although it's more accurate, I'm concerned it might increase CPU load, especially on 32-bit systems

tick unit impact up_?delay reported here: https://github.com/apache/nuttx/pull/17221

xiaoxiang781216 avatar Nov 10 '25 13:11 xiaoxiang781216

@Fix-Point please fix ci issue:

../nuttx/tools/checkpatch.sh -c -u -m -g 6088f6216b8fe8a7f2cc76742403f7147bb9f823..HEAD
Missing git commit message
Commit subject missing colon (e.g. 'subsystem: msg')
Missing git commit message
Commit subject missing colon (e.g. 'subsystem: msg')
Missing git commit message
Commit subject missing colon (e.g. 'subsystem: msg')
Missing git commit message
Missing git commit message
Missing git commit message
Missing git commit message
Used config files:
    1: .codespellrc
Some checks failed. For contributing guidelines, see:

reading sources... [100%] substitutions


Warning, treated as error:
/home/runner/work/nuttx/nuttx/Documentation/components/drivers/timers/oneshot/index.rst:14:Definition list ends without a blank line; unexpected unindent.
make: *** [Makefile:48: html] Error 2
Error: Process completed with exit code 2.

Done.

Fix-Point avatar Nov 11 '25 03:11 Fix-Point

@xiaoxiang781216 @Fix-Point This looks similar to the hrtimer driver I planned to add in my PR https://github.com/apache/nuttx/pull/17065, which was intended to provide access to the hardware timer count.

However, @xiaoxiang781216 mentioned in that PR that adding a new timer driver is not allowed in NuttX, so I removed the implementation, and reimplemented the sched/hrtimer based on the exsiting timer driver.

Your PR to introcude ClockDevice makes me feel confused about the criteria for what kind of functionality is acceptable to add to NuttX...

wangchdo avatar Nov 13 '25 02:11 wangchdo

@xiaoxiang781216 @Fix-Point This looks similar to the hrtimer driver I planned to add in my PR #17065, which was intended to provide access to the hardware timer count.

However, @xiaoxiang781216 mentioned in that PR that adding a new timer driver is not allowed in NuttX, so I removed the implementation, and reimplemented the sched/hrtimer based on the exsiting timer driver.

Your PR to introcude ClockDevice makes me feel confused about the criteria for what kind of functionality is acceptable to add to NuttX...

This is still oneshot driver, but change the tick callback to count callback since the tick version lose the accuary of hardware timer, other major improvement include:

  1. Provide the high efficient implementation of count<->tick and count<->timespec conversion in framework to simplify the driver developement.
  2. Modify all arch general timer to utlize the new improvement

but this change doesn't add any new arch up_xxx function, scheduler code doesn't call the count version, but sill call original up_xxx api.

All above improvement isn't related to hrtimer at all, both tick and tickless mode can get the benifit(performance improvent and driver simplication) from this PR.

The new hrtimer implementation doesn't add any new arch up_xxx function too, which continue use up_(alarm|timer)start and up(alarm|timer)_cancel.

xiaoxiang781216 avatar Nov 13 '25 06:11 xiaoxiang781216

@Fix-Point should we close this pr?

xiaoxiang781216 avatar Nov 24 '25 11:11 xiaoxiang781216

@Fix-Point should we close this pr?

Please hold off on this, I will close this PR after the 8. Inlining arch_alarm for performance (3% ~ 5% less execution time for clock_gettime, wd_start and wd_expiration). is finished and merged.

Fix-Point avatar Nov 24 '25 13:11 Fix-Point