core/debug: printf what asked to print
Contribution description
removes the unhelpful stack-size-test from debug-print
Testing procedure
Issues/PRs references
this is much more sane than #20166
Murdock results
:heavy_check_mark: PASSED
b39fbe23e389e581b9c5e76fd082e0c33ce1b317 core/debug: printf what asked to print
| Success | Failures | Total | Runtime |
|---|---|---|---|
| 8082 | 0 | 8082 | 10m:54s |
Artifacts
Can I haz an explanation why this check is no longer needed?
With a switch to picolibc we wouldn't get stack overflows on printf() that easily.
With newlib, this can still happen. We do have the MPU based stack overflow and the heuristic in place in addition, so that stack overflows should at least get caught. And an "the debug info you wanted to see cannot be shown until you increase stack size" and an "stack overflown" both have the same result: Bump the stack size and flash again. So there isn't much lost even when using newlib.
With a switch to picolibc we wouldn't get stack overflows on printf() that easily.
Not that easily but still possible, right? And as far as I understood newlib is still the default or am I wrong?
With newlib, this can still happen. We do have the MPU based stack overflow and the heuristic in place in addition, so that stack overflows should at least get caught.
On how many platforms do we have MPU support? And what is the "heuristic" one?
Most Cortex M boards have an MPU, but for some we don't use it to due bugs, if I recall correctly.
Some RISC-V MCUs also have an MPU (they call it differently, though). I'm not sure if we use that to guard the stack, though. @teufelchen should know.
The heuristic is the THREAD_CREATE_STACKTEST that fills the stack with canary values. (The address of a memory location is used as canary for that location.) The heuristic can fail if despite the stack overflow the canary is still there (hitting the value by chance is 1 to 2^32-1 on 32 bit systems). The heuristic in practice works pretty well, but it is enavled by default only with develhelp, if I recall correctly.
As far as I recall these MPUs (or at least some of them) only allow to protect a limited number of memory segments, i.e., cannot be used for arbitrary stack protection.
Regarding STACKTEST: that doesn't provide any stack protection but only shows stack usage. This feature already was in place when this macro was introduced but didn't help that much.
I totally agree that this macro hack is ugly as fuck but was introduced for a reason and I'm not convinced that reason has vanished.
@OlegHahm:
https://github.com/RIOT-OS/RIOT/blob/master/core/sched.c#L126-L139
The heuristic can fail if despite the stack overflow the canary is still there (hitting the value by chance is 1 to 2^32-1 on 32 bit systems). The heuristic in practice works pretty well, but it is enavled by default only with develhelp, if I recall correctly.
Actually the chance of overflowing the stack without hitting the test is much higher, as sth might just allocate a huge buffer on stack, just modifying the SP, but not writing to it. the test only covers changing the exact canary value, not anything below.
As far as I recall these MPUs (or at least some of them) only allow to protect a limited number of memory segments, i.e., cannot be used for arbitrary stack protection.
What we do here is to also switch segments when task switching. I'm not sure though whether we have a segment for the ISR stack.
Actually the chance of overflowing the stack without hitting the test is much higher, as sth might just allocate a huge buffer on stack, just modifying the SP, but not writing to it.
That is pretty like one of the vectors an attacker would try; it would also work when "jumping over" the MPU guard space.
The Linux kernel uses a similar mechanism to detect stack overflows, I think it has at least two unmapped pages after the stack ("after" here in terms of the direction the stack grows) in the virtual address, causing segfaults on most stack overflows. But jumping of the guard space is possible there as well. I think GCC can be told to touch at least one byte in every page it allocates on the stack to prevent this attack vector, but it requires binaries to be compiled with that magic. We might want to enable that as well (but with MPU guard area size instead of page size)?
I also experimented with enlarging the canary value to a proper redzone (bringing non-mpu checks closer to what the mpu does), but never got to PR that. this is the branch.
I like the idea, but I don't like zero being the magic number there. I think e.g. GCC when asked to write at least one byte to every page of stack allocation during the allocation will write zero to it. The stacktest approach would also make it more difficult to correctly guess the correct magic value, as it depends on memory layout. (But then again, the memory layout is reproducible and static within the same firmware.)
Maybe we could also just check one word on context switch, and have the idle thread check the red zones of all stacks before going to sleep? Or is the context switch overhead not as bad as I assume?
I like the idea, but I don't like zero being the magic number there.
I guess re-using the stack test approach makes sense.
Or is the context switch overhead not as bad as I assume?
TBH, I don't remember. I worked on that at the side like, three years ago. I'd assume the overhead to be prohibitive for production, though it might not be that bad. (assuming a word wise memory compare takes a cycle, in theory we're looking at only like, +10-20 cycles per context switch, which would be <5-10% overhead.)
Regarding this PR: the reason for this check was that enabling the debug macro may easily result in a stack overflow because of newlib's high stack usage for printf. In that case I would assume that the chances are actually not that bad that the scheduler stack test in place would catch these cases. Hence, the check may be obsolete indeed.
So, how about removing this check and see if we notice that people start to run into this type of problem on a regular basis, we can still revert the change.
As stated above: from my perspective, let's give it a try.