ztimer_periodic: occasional lockups
Description
I am seeing occasional lockups in my application of about 2 seconds. This began when I added a 10 msec repeating event using event_periodic. This was using ZTIMER_USEC so 10,000 ticks. Further digging lead me to add a print statement to the ISR code of ztimer_periodic. It was observed that occasionally the timer was firing 1 tick early, leading to the timer being continuously scheduled for 0 ticks in the future until the calculated target time looped back around and synced back up. The ISR contains this check to allow late callbacks to be caught back up, it does not account for early callbacks. The ztimer API doc does state that accuracy of +/- should be expected. At first, I believed that the error is with ztimer_periodic, as it should allow to be called back 1 tick early. I realize now that the stated accuracy is a comparison to wall clock time. While a callback can be 1 tick less than expected as measured externally, this would not be reflected in calls to ztimer_now(). I now suspect the problem is actually the ztimer_auto_adjust module, which I was using. I believe that the clock adjustment on init was measuring it to be one tick (or more) larger than the absolute largest we can ever expect (remember the basic accuracy is +/-1). This lead to the adjustment value sometime subtracting enough that some callbacks were actually 1 tick earlier than expected. I suspect this is not the only place in the code that could fail when this happens.
Steps to reproduce the issue
Apply the following patch:
diff --git a/tests/sys/ztimer_periodic/main.c b/tests/sys/ztimer_periodic/main.c
index 6ae3fdf53f..078bfa6eec 100644
--- a/tests/sys/ztimer_periodic/main.c
+++ b/tests/sys/ztimer_periodic/main.c
@@ -87,6 +87,7 @@ int main(void)
* result in a failing test, despite the timeout actually being triggered close enough
* to the target. */
uint32_t last = t.last - _intervals[j];
+ t.last += 1;
if (!ztimer_is_set(clock, &t.timer)) {
print_str("Test failed, timer not set\n");
Then run make BOARD=nucleo-f303ze flash test to simulate the failure I am seeing.
Expected results
Expect only 1 call of periodic callback per time period.
Actual results
Occasionally hangs for about 2 seconds while periodic callback is called as fast as possible.
Versions
I first observed in this branch of my fork
This is happening on an STM32H7 CPU clocked at 550 MHz. This is faster than many boards in-tree and the CPU has some features, such as out of order execution, which make benchmarking of code produce unexpected results for me at times. Maybe this could accunt for the alleged error with ztimer_auto_adjust.
$ make print-versions
Operating System Environment
----------------------------
Operating System: "Ubuntu" "22.04.5 LTS (Jammy Jellyfish)"
Kernel: Linux 6.8.0-52-generic x86_64 x86_64
System shell: /usr/bin/dash (probably dash)
make's shell: /usr/bin/dash (probably dash)
Installed compiler toolchains
-----------------------------
native gcc: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
arm-none-eabi-gcc: arm-none-eabi-gcc (15:10.3-2021.07-4) 10.3.1 20210621 (release)
avr-gcc: missing
msp430-elf-gcc: missing
riscv-none-elf-gcc: missing
riscv64-unknown-elf-gcc: missing
riscv32-esp-elf-gcc: missing
xtensa-esp32-elf-gcc: missing
xtensa-esp32s2-elf-gcc: missing
xtensa-esp32s3-elf-gcc: missing
xtensa-esp8266-elf-gcc: missing
clang: missing
Installed compiler libs
-----------------------
arm-none-eabi-newlib: "3.3.0"
msp430-elf-newlib: missing
riscv-none-elf-newlib: missing
riscv64-unknown-elf-newlib: missing
riscv32-esp-elf-newlib: missing
xtensa-esp32-elf-newlib: missing
xtensa-esp32s2-elf-newlib: missing
xtensa-esp32s3-elf-newlib: missing
xtensa-esp8266-elf-newlib: missing
avr-libc: missing (missing)
Installed development tools
---------------------------
ccache: missing
cmake: cmake version 3.22.1
cppcheck: missing
doxygen: 1.9.1
git: git version 2.34.1
make: GNU Make 4.3
openocd: Open On-Chip Debugger 0.11.0
python: missing
python2: missing
python3: Python 3.10.12
flake8: error: /usr/bin/python3: No module named flake8
coccinelle: missing
Hmm, I ran some tests setting: CONFIG_ZTIMER_AUTO_ADJUST_SETTLE = 1000000, CONFIG_ZTIMER_AUTO_ADJUST_BASE_ITVL = 5000, and CONFIG_ZTIMER_AUTO_ADJUST_ITER = 10000. ZTIMER_USEC->adjust_set remain unchanged (2). Not sure if this is a clue or not yet. When the lockup occurs, the callback is always 1 tick early. I am detecting the problem with the below modification to ztimer_periodic. When the issue happens, the print statement always shows that offset is 1 tick larger than timer->interval (fired 1 tick too early).
diff --git a/sys/ztimer/periodic.c b/sys/ztimer/periodic.c
index 70cba0ece2..3ff639f25e 100644
--- a/sys/ztimer/periodic.c
+++ b/sys/ztimer/periodic.c
@@ -32,10 +32,18 @@ static void _ztimer_periodic_reset(ztimer_periodic_t *timer, ztimer_now_t now)
ztimer_now_t offset = target - now;
if (offset > timer->interval) {
+ printf("%lu %lu %lu %lu\n"
+ , now
+ , target
+ , offset
+ , timer->interval
+ );
offset = 0;
+ timer->last = now;
+ }
+ else {
+ timer->last = target;
}
-
- timer->last = target;
ztimer_set(timer->clock, &timer->timer, offset);
}
Here is an exampe output:
2025-04-07 12:47:11,044 # 2593496270 2593696271 200001 200000
2025-04-07 12:48:44,635 # 2687096269 2687296270 200001 200000
2025-04-07 12:51:38,643 # 2861096268 2861296269 200001 200000
2025-04-07 12:53:29,249 # 2971696267 2971896268 200001 200000
2025-04-07 12:54:33,451 # 3035896266 3036096267 200001 200000
2025-04-07 12:54:37,832 # 3040274698 3040279699 5001 5000
2025-04-07 12:57:59,440 # 3241884545 3242084546 200001 200000
2025-04-07 12:59:31,880 # 3334319697 3334324698 5001 5000
2025-04-07 13:02:05,004 # 3487439696 3487444697 5001 5000
2025-04-07 13:03:55,978 # 3598409695 3598414696 5001 5000
2025-04-07 13:14:15,568 # 4217999694 4218004695 5001 5000