RIOT icon indicating copy to clipboard operation
RIOT copied to clipboard

ztimer_periodic: occasional lockups

Open Enoch247 opened this issue 9 months ago • 1 comments

Description

I am seeing occasional lockups in my application of about 2 seconds. This began when I added a 10 msec repeating event using event_periodic. This was using ZTIMER_USEC so 10,000 ticks. Further digging lead me to add a print statement to the ISR code of ztimer_periodic. It was observed that occasionally the timer was firing 1 tick early, leading to the timer being continuously scheduled for 0 ticks in the future until the calculated target time looped back around and synced back up. The ISR contains this check to allow late callbacks to be caught back up, it does not account for early callbacks. The ztimer API doc does state that accuracy of +/- should be expected. At first, I believed that the error is with ztimer_periodic, as it should allow to be called back 1 tick early. I realize now that the stated accuracy is a comparison to wall clock time. While a callback can be 1 tick less than expected as measured externally, this would not be reflected in calls to ztimer_now(). I now suspect the problem is actually the ztimer_auto_adjust module, which I was using. I believe that the clock adjustment on init was measuring it to be one tick (or more) larger than the absolute largest we can ever expect (remember the basic accuracy is +/-1). This lead to the adjustment value sometime subtracting enough that some callbacks were actually 1 tick earlier than expected. I suspect this is not the only place in the code that could fail when this happens.

Steps to reproduce the issue

Apply the following patch:

diff --git a/tests/sys/ztimer_periodic/main.c b/tests/sys/ztimer_periodic/main.c
index 6ae3fdf53f..078bfa6eec 100644
--- a/tests/sys/ztimer_periodic/main.c
+++ b/tests/sys/ztimer_periodic/main.c
@@ -87,6 +87,7 @@ int main(void)
          * result in a failing test, despite the timeout actually being triggered close enough
          * to the target. */
         uint32_t last = t.last - _intervals[j];
+        t.last += 1;
 
         if (!ztimer_is_set(clock, &t.timer)) {
             print_str("Test failed, timer not set\n");

Then run make BOARD=nucleo-f303ze flash test to simulate the failure I am seeing.

Expected results

Expect only 1 call of periodic callback per time period.

Actual results

Occasionally hangs for about 2 seconds while periodic callback is called as fast as possible.

Versions

I first observed in this branch of my fork

This is happening on an STM32H7 CPU clocked at 550 MHz. This is faster than many boards in-tree and the CPU has some features, such as out of order execution, which make benchmarking of code produce unexpected results for me at times. Maybe this could accunt for the alleged error with ztimer_auto_adjust.

$ make print-versions 

Operating System Environment
----------------------------
         Operating System: "Ubuntu" "22.04.5 LTS (Jammy Jellyfish)"
                   Kernel: Linux 6.8.0-52-generic x86_64 x86_64
             System shell: /usr/bin/dash (probably dash)
             make's shell: /usr/bin/dash (probably dash)

Installed compiler toolchains
-----------------------------
               native gcc: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
        arm-none-eabi-gcc: arm-none-eabi-gcc (15:10.3-2021.07-4) 10.3.1 20210621 (release)
                  avr-gcc: missing
           msp430-elf-gcc: missing
       riscv-none-elf-gcc: missing
  riscv64-unknown-elf-gcc: missing
      riscv32-esp-elf-gcc: missing
     xtensa-esp32-elf-gcc: missing
   xtensa-esp32s2-elf-gcc: missing
   xtensa-esp32s3-elf-gcc: missing
   xtensa-esp8266-elf-gcc: missing
                    clang: missing

Installed compiler libs
-----------------------
     arm-none-eabi-newlib: "3.3.0"
        msp430-elf-newlib: missing
    riscv-none-elf-newlib: missing
riscv64-unknown-elf-newlib: missing
   riscv32-esp-elf-newlib: missing
  xtensa-esp32-elf-newlib: missing
xtensa-esp32s2-elf-newlib: missing
xtensa-esp32s3-elf-newlib: missing
xtensa-esp8266-elf-newlib: missing
                 avr-libc: missing (missing)

Installed development tools
---------------------------
                   ccache: missing
                    cmake: cmake version 3.22.1
                 cppcheck: missing
                  doxygen: 1.9.1
                      git: git version 2.34.1
                     make: GNU Make 4.3
                  openocd: Open On-Chip Debugger 0.11.0
                   python: missing
                  python2: missing
                  python3: Python 3.10.12
                   flake8: error: /usr/bin/python3: No module named flake8
               coccinelle: missing

Enoch247 avatar Apr 07 '25 15:04 Enoch247

Hmm, I ran some tests setting: CONFIG_ZTIMER_AUTO_ADJUST_SETTLE = 1000000, CONFIG_ZTIMER_AUTO_ADJUST_BASE_ITVL = 5000, and CONFIG_ZTIMER_AUTO_ADJUST_ITER = 10000. ZTIMER_USEC->adjust_set remain unchanged (2). Not sure if this is a clue or not yet. When the lockup occurs, the callback is always 1 tick early. I am detecting the problem with the below modification to ztimer_periodic. When the issue happens, the print statement always shows that offset is 1 tick larger than timer->interval (fired 1 tick too early).

diff --git a/sys/ztimer/periodic.c b/sys/ztimer/periodic.c
index 70cba0ece2..3ff639f25e 100644
--- a/sys/ztimer/periodic.c
+++ b/sys/ztimer/periodic.c
@@ -32,10 +32,18 @@ static void _ztimer_periodic_reset(ztimer_periodic_t *timer, ztimer_now_t now)
     ztimer_now_t offset = target - now;
 
     if (offset > timer->interval) {
+        printf("%lu %lu %lu %lu\n"
+            , now
+            , target
+            , offset
+            , timer->interval
+            );
         offset = 0;
+        timer->last = now;
+    }
+    else {
+        timer->last = target;
     }
-
-    timer->last = target;
 
     ztimer_set(timer->clock, &timer->timer, offset);
 }

Here is an exampe output:

2025-04-07 12:47:11,044 # 2593496270 2593696271 200001 200000
2025-04-07 12:48:44,635 # 2687096269 2687296270 200001 200000
2025-04-07 12:51:38,643 # 2861096268 2861296269 200001 200000
2025-04-07 12:53:29,249 # 2971696267 2971896268 200001 200000
2025-04-07 12:54:33,451 # 3035896266 3036096267 200001 200000
2025-04-07 12:54:37,832 # 3040274698 3040279699 5001 5000
2025-04-07 12:57:59,440 # 3241884545 3242084546 200001 200000
2025-04-07 12:59:31,880 # 3334319697 3334324698 5001 5000
2025-04-07 13:02:05,004 # 3487439696 3487444697 5001 5000
2025-04-07 13:03:55,978 # 3598409695 3598414696 5001 5000
2025-04-07 13:14:15,568 # 4217999694 4218004695 5001 5000

Enoch247 avatar Apr 07 '25 17:04 Enoch247