build system: use thread-safe stdio
Contribution description
build system: introduce bug_% feature category
This introduces a new feature category to declare which bugs are present in a given build that we cannot fix in RIOT, but need to work around. These bugs may be silicon bugs, software bugs in ROM, software bugs in binary blobs needed for a platform, or bugs in the toolchain.
These features behave the same way as arch_% features: Every provided
bug is always used, so that we can inspect FEATURES_USED to check
which bugs need to be worked around.
build system: use thread-safe stdio
This makes use of the new bug modeling to declare all platforms that can use newlib and have no reentrancy hooks as affected by the non-thread-safe stdio bug. (Which is every platform but ESP* and AVR, the former because the reentrancy hooks are provided, the latter because we do not and never will support newlib on them.)
Building on that, the mpaland-printf package is used when newlib is used and the bug is present. This way we can rely on the stdio being thread-safe on every platform and not causing random crashes at run time.
sys/newlib: drop workaround for stdio
This drops a workaround that initialized newlib's reentrancy structure on boot to reduce the chances of crashes when using the non-thread-safe (unless reentrancy hooks are provided) stdio implementation of newlib.
Now that the newlib stdio implementation is only ever used if it is thread-safe, we no longer need a workaround that reduces the chance of crashes on concurrent use of stdio.
Testing procedure
Now the package mpaland-printf should be pulled in for all non-ESP boards when newlib is used. E.g.
for board in hifive1b same54-xpro native32 native64 arduino-mega2560 esp32-mh-et-live-minikit esp8266-esp-12x z1 nucleo-l011k4; do
printf '\n%s\n===================\n' $board
echo "Features:"
make BOARD=$board --no-print-directory -C examples/basic/hello-world info-features-used | grep -E 'newlib|libc'
echo "Packages:"
make BOARD=$board --no-print-directory -C examples/basic/hello-world info-packages
done
hifive1b
===================
Features:
bug_newlib_broken_stdio
newlib
Packages:
mpaland-printf
same54-xpro
===================
Features:
bug_newlib_broken_stdio
newlib
Packages:
cmsis
mpaland-printf
native32
===================
Features:
Packages:
native64
===================
Features:
Packages:
arduino-mega2560
===================
Features:
Packages:
esp32-mh-et-live-minikit
===================
Features:
newlib
Packages:
esp32_sdk
esp8266-esp-12x
===================
Features:
newlib
Packages:
esp8266_sdk
z1
===================
Features:
bug_newlib_broken_stdio
Packages:
mpaland-printf
nucleo-l011k4
===================
Features:
bug_newlib_broken_stdio
picolibc
Packages:
cmsis
With this, the test in tests/sys/snprintf now should also pass for each and every board, except for ESP32 boards. The reason is that, since ESP* provides the reentrancy hooks, newlib's stdio is thread-safe on those. So we do not pull in the alternative printf. For ESP8266 boards newlib is configured to use the non-nano stdio, which passes the test. For ESP32 newlib-nano is used, which does not implement all format specifiers and, therefore, fails the test.
Issues/PRs references
Depends on and includes:
- [x] https://github.com/RIOT-OS/RIOT/pull/21439
Murdock results
:heavy_check_mark: PASSED
4a84f6f58e47c330c813a5eba5ab6efe45d6c64c tests/sys/snprintf: use printf_long_long
| Success | Failures | Total | Runtime |
|---|---|---|---|
| 10536 | 0 | 10536 | 20m:02s |
Artifacts
We should probably give this a full build without fast fail, otherwise you'll be chasing your tail for days 😅
We should probably give this a full build without fast fail, otherwise you'll be chasing your tail for days 😅
No, actually the memory requirements will go down with this. Unless... both printf variants get linked in into the same app.
See https://github.com/RIOT-OS/RIOT/pull/21439 for a fix of that.
in one of your commit messages and in the description of this PR you have every misspelled.
This way we can rely on the stdio being thread-safe on evyer platform and not causing random crashes at run time.
For MSP430 both .text and .data slightly increase with this. This is not expected and needs to be investigated first.
Ah, 64 bit math is expensive on MSP430. Disabling support for long long and .data stays the same and .text goes down when switching to mpaland-printf.
I was hoping that mpaland-printf would be smaller for all platforms while supporting all features. But maybe support for printing long long should be a module to not cause regressions for MSP430. It could still be a default module on 32 bit and 64 bit platforms, were support for long long is super cheap.
Btw: you can use the built-in function to compare Build Sizes: https://doc.riot-os.org/advanced-build-system-tricks.html#comparing-build-sizes
Someone (👀👀) recently fixed it :D
I didn't want to install the RISC-V and ESP32 toolchains as well, so this is just a smaller set.
cbuec@W11nMate:~/RIOTstuff/riot-mpaland/RIOT/examples/basic/hello-world$ OLDBIN=master NEWBIN=mpaland make info-buildsizes-diff
using BOARD="native64" as "native" on a 64-bit system
text data bss dec BOARD/BINDIRBASE
22 0 0 22 arduino-mega2560
5532 118 904 6554 master
5554 118 904 6576 mpaland
24 0 0 24 native64
27566 1080 59128 87774 master
27590 1080 59128 87798 mpaland
20 0 0 20 nucleo-l011k4
5996 24 1772 7792 master
6016 24 1772 7812 mpaland
-3188 -108 -8 -3304 same54-xpro
12632 116 2308 15056 master
9444 8 2300 11752 mpaland
I think there is missing a RIOT_CI_BUILD=1,as the longer version (which includes the branch name unless it is master) is probably the cause for the size diff on AVR. (avrlibc's stdio is thread safe naturally and better optimized for 8 bit platforms than mpaland-printf, so sticking with the native printf there makes sense and is done here.) Same with the nucleo-l011k, which only supports picolibc. And on picolibc the native printf is also excellent.
Yes, with RIOT_CI_BUILD=1 the sizes are identical for the other boards.
https://github.com/RIOT-OS/RIOT/pull/21445 now makes 64 bit printing support optional. It makes that support opt-in for MSP430 (due it being expensive there), and opt-out for 32-bit systems (due to it being cheap there).
I'd like to do some more testing before hitting the green button on this one. There seems to be something off on RISC-V with this PR, at least with the BUILD_IN_DOCKER=1 toolchain.
OK, on RISC-V enabling support for long long printing is also not almost free. I dropped the soft dependency on printf_long_long from arch32_bit (turning opt-out to opt-in).
The soft dependency from arch64_bit to printf_long_long was also not correct: Without printf_long_long on 64-bit, %p will not work. Hence, 64-bit platforms need a hard dependency on printf_long_long.
Finally, on ARM (both legacy and Cortex M) I could not see any difference in memory with printf_long_long enabled. Would be nice to understand the background of this first before merging it. But if that is expected behavior, than enabling printf_long_long by default on ARM makes sense, as it adds more features for no cost.
Would be nice to understand the background of this first before merging it. But if that is expected behavior, than enabling printf_long_long by default on ARM makes sense, as it adds more features for no cost.
Are you still wanting to investigate this before merging? Should this PR be marked as draft?
Are you still wanting to investigate this before merging?
Just did and it looks like human error. On ARM it also safes ~1K of .text, so it behaves as expected.
I'll change the default to consistently only support long long when disabled via the printf_long_long module; with the one exception of 64 bit systems.
I think this is ready to get merged :)
I traced down the issue of the increased ROM size for the wolfSSL example to the fact that two assert() implementations are used in it: The newlib's and the RIOT one.
I added a commit that wraps the newlib's __assert_func() to RIOT's assertion failure handlers for more consistent behavior. In addition this will result in that assert() not pulling in low-level printf() functions from newlib even when newlib's stdio is wrapped.
This should now finally pass the CI
We should probably do a full build with no fast fail before trying to merge again.
But hitting the merge button would also do a full build. Doing a full build prior would just cause the 7 days of CPU time for a full build cycle to be spent twice on the same code.
Ah sorry for the noise, that's probably the removed printf_long_long support.
Should we rather add printf_long_long to the test application or feature-gate test_int64?
Ah sorry for the noise, that's probably the removed
printf_long_longsupport.
No, its IMO not noise. A test failing with default configuration is IMO a genuine bug :)
Should we rather add printf_long_long to the test application or feature-gate test_int64?
I did both. There might be users with tiny boards that can benefit from easily being able to disable printf_long_long in the test, but having the larger test coverage by default is probably better.
Please squash :)
Yay :tada:
Thanks everyone for bearing with me on this one :heart:
This resulted in some failing nightly tests, as mpaland-printf prints %p without leading 0x. See the unmerged PR over at https://github.com/mpaland/printf/pull/90.
As the printing format is indeed implementation-defined, the respective tests should probably be adapted to accept output both with and without 0x.
Another problem, at least on my machine:
$ make -C tests/sys/heap_cmd BOARD=nrf52840dk all
make: Entering directory '/home/mikolai/TUD/Code/RIOT/tests/sys/heap_cmd'
Building application "tests_heap_cmd" for "nrf52840dk" with CPU "nrf52".
"make" -C /home/mikolai/TUD/Code/RIOT/pkg/cmsis/
"make" -C /home/mikolai/TUD/Code/RIOT/pkg/mpaland-printf/
"make" -C /home/mikolai/.riot/pkg/mpaland-printf -f /home/mikolai/TUD/Code/RIOT/pkg/mpaland-printf/mpaland-printf.mk
"make" -C /home/mikolai/TUD/Code/RIOT/boards
"make" -C /home/mikolai/TUD/Code/RIOT/boards/common/init
"make" -C /home/mikolai/TUD/Code/RIOT/boards/nrf52840dk
"make" -C /home/mikolai/TUD/Code/RIOT/boards/common/nrf52xxxdk
"make" -C /home/mikolai/TUD/Code/RIOT/core
"make" -C /home/mikolai/TUD/Code/RIOT/core/lib
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/nrf52
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/cortexm_common
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/cortexm_common/periph
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/nrf52/periph
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/nrf52/vectors
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/nrf5x_common
"make" -C /home/mikolai/TUD/Code/RIOT/cpu/nrf5x_common/periph
"make" -C /home/mikolai/TUD/Code/RIOT/drivers
"make" -C /home/mikolai/TUD/Code/RIOT/drivers/periph_common
"make" -C /home/mikolai/TUD/Code/RIOT/sys
"make" -C /home/mikolai/TUD/Code/RIOT/sys/auto_init
"make" -C /home/mikolai/TUD/Code/RIOT/sys/div
"make" -C /home/mikolai/TUD/Code/RIOT/sys/isrpipe
"make" -C /home/mikolai/TUD/Code/RIOT/sys/libc
"make" -C /home/mikolai/TUD/Code/RIOT/sys/malloc_thread_safe
"make" -C /home/mikolai/TUD/Code/RIOT/sys/newlib_syscalls_default
"make" -C /home/mikolai/TUD/Code/RIOT/sys/preprocessor
"make" -C /home/mikolai/TUD/Code/RIOT/sys/ps
"make" -C /home/mikolai/TUD/Code/RIOT/sys/shell
"make" -C /home/mikolai/TUD/Code/RIOT/sys/shell/cmds
"make" -C /home/mikolai/TUD/Code/RIOT/sys/stdio
"make" -C /home/mikolai/TUD/Code/RIOT/sys/stdio_uart
"make" -C /home/mikolai/TUD/Code/RIOT/sys/test_utils/interactive_sync
"make" -C /home/mikolai/TUD/Code/RIOT/sys/test_utils/print_stack_usage
"make" -C /home/mikolai/TUD/Code/RIOT/sys/tsrb
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libc_nano.a(lib_a-mstats.o): in function `_mstats_r':
/home/pere/src/newlib-salsa/build_nano/arm-none-eabi/thumb/v7e-m+fp/hard/newlib/libc/stdlib/../../../../../../../../newlib/libc/stdlib/mstats.c:121: undefined reference to `__wrap_fiprintf'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /home/pere/src/newlib-salsa/build_nano/arm-none-eabi/thumb/v7e-m+fp/hard/newlib/libc/stdlib/../../../../../../../../newlib/libc/stdlib/mstats.c:121: undefined reference to `__wrap_fiprintf'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libc_nano.a(lib_a-nano-mallstatsr.o): in function `_malloc_stats_r':
/home/pere/src/newlib-salsa/build_nano/arm-none-eabi/thumb/v7e-m+fp/hard/newlib/libc/stdlib/../../../../../../../../newlib/libc/stdlib/nano-mallocr.c:523: undefined reference to `__wrap_fiprintf'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /home/pere/src/newlib-salsa/build_nano/arm-none-eabi/thumb/v7e-m+fp/hard/newlib/libc/stdlib/../../../../../../../../newlib/libc/stdlib/nano-mallocr.c:525: undefined reference to `__wrap_fiprintf'
/usr/lib/gcc/arm-none-eabi/12.2.1/../../../arm-none-eabi/bin/ld: /home/pere/src/newlib-salsa/build_nano/arm-none-eabi/thumb/v7e-m+fp/hard/newlib/libc/stdlib/../../../../../../../../newlib/libc/stdlib/nano-mallocr.c:527: undefined reference to `__wrap_fiprintf'
collect2: error: ld returned 1 exit status
make: *** [/home/mikolai/TUD/Code/RIOT/tests/sys/heap_cmd/../../../Makefile.include:755: /home/mikolai/TUD/Code/RIOT/tests/sys/heap_cmd/bin/nrf52840dk/tests_heap_cmd.elf] Error 1
make: Leaving directory '/home/mikolai/TUD/Code/RIOT/tests/sys/heap_cmd'
https://github.com/RIOT-OS/RIOT/blob/c0cc617c7f6addd35ee8d2e827f768eabcf6b83b/pkg/mpaland-printf/Makefile.include#L16-L25
suggests that this is failing on purpose, but I'm not sure what to make out of this. Apparently that also didn't happen on CI for some reason (maybe it was not properly rebuilding all modules for the test?).
As the printing format is indeed implementation-defined, the respective tests should probably be adapted to accept output both with and without
0x.
There we go: https://github.com/RIOT-OS/RIOT/pull/21677
Hey,
commit build system: use thread-safe stdio introduced a severe performance penalty. Together with the penalty of slipmux, this makes the RIOT shell basically unusable slow (think >7s for running ps). I haven't measured yet how bad the issue is when not using slipmux, I suspect it's still so fast the human eye doesn't realize it otherwise this would have been spotted earlier. There is also the option that this is specific in conjunction to using slipmux but at the moment that seems unlikely to me but further digging is needed.
The issue immediatly goes away when disabling bug_newlib_broken_stdio for my board, so it originates in mpaland-printf.
Yes, mpaland-printf sends one char out at a time, as it does not do buffering. With a network as output, unbuffered output makes little sense.
@maribu do you have a rough idea how to approach this issue? I feel a bit lost to find a starting point.