Arduino Evaluate stack guard implementation(s)

Evaluate stack guard implementation(s)

Open mcspr opened this issue 1 year ago • 10 comments

Right now we have at least three ways to ensure stack-smash does not happen.

CONT guard check, with two u32 at both start and the end our struct. Additionally, we implement stack size check and fill our local array with the same value. ref. https://gitter.im/esp8266/Arduino?at=630eddaff4d7a323dead0fcf and https://github.com/esp8266/Arduino/blob/master/cores/esp8266/cont_util.cpp cont_... functions related to checks and 're-painting'
-fstack-protector that instruments functions (also __attribute__((stack_protect)) to manually inject such checks) with guard variables and a special __stack_chk_{guard,fail()}; ref. https://gcc.gnu.org/onlinedocs/gccint/Stack-Smashing-Protection.html and https://github.com/esp8266/Arduino/blob/master/cores/esp8266/core_esp8266_postmortem.cpp implementations
StackThunk also implements guards for its stack through a similar mechanisms (ref. stack_thunk_fatal_overflow() and main implementation in https://github.com/esp8266/Arduino/blob/master/cores/esp8266/StackThunk.h)

Could GCC implementation supplement both StackThunk and CONT? Or, replace it? (see __attribute__ above) Should CONT checks randomize its guard value to separate stack contents themselves from structure guard members? Could we add something like address sanitizer that verifies that we don't go over the stack boundaries, not just protect us from writing things over one specific u32 value? w

Sep 02 '22 05:09 mcspr

The CONT and StackThunk guards and -fstack-protector are protecting against two different problems, I believe.

AIUI stack-protector checks for things like writing a local stack array beyond bounds to, say, inject a different return address or a higher-level call's locals. It has no idea what the real total stack limits are, only where the current function call's place on the stack it.

The CONT and StackThunk guards check for total stack overflow, after the fact. Too many levels of calls or too large locals. (i.e. every funciton was well behaved, but they needed, say, 5K of stack)

GCC never knows the real total stack limits, that's more of the OS's job to handle.

One idea toyed with before (there should be a PR or discussion w/a code sample in fact) was to set the HW data breakpoint to the last 16 (or other power-of-two) bytes of the stack. With that you would get the Cont/Thunk stack checker behavior, instantly.

A pie in the sky idea would be to adjust the GCC xtensa machine description to check any stack allocations at runtime against some well-known "current stack limits". Pretty massive performance penalty and it would not, obviously, do anything with the blob code.

Sep 02 '22 08:09 earlephilhower

Could we add something like address sanitizer that verifies that we don't go over the stack boundaries, not just protect us from writing things over one specific u32 value?

Just curious. What would you expect the code should do then? I get the idea of preventing to overwrite stuff outside some range instead of detecting already corrupted data. But what would be the alternative then? It is not like the compiler can detect all runtime situations where entering a function might cause a stack overflow, so then it will be a runtime thing. But then what? Throw an exception which in almost any project equals a crash anyway.

Edit: N.B. we already have some checks somewhere I guess, since I hit them every now and then when writing a proper new bug, where addressing something outside an allocated array causes a crash (load prohibited kind of error) so why not using the same for the stack implementation? Or is it already implemented as such?

Sep 02 '22 08:09 TD-er

Every variant of stack smashing protection seem to utilize a canary value that is either nearby the stack memory or in case of protector just added as an additional stack variable that is created on stack immediately as function is called. I think I misunderstood the way boundaries are set, so it does not seem that CONT could interact with the protector code that well just by checking that nearby memory is untouched :/

The repainting still seems suspect, don't either guards or paint value need to change so we don't mistake structure with the array itself when both values are equal?

The mentioned code that does HW breakpoint is https://github.com/esp8266/Arduino/issues/5158 (not sure which one of those gists is the correct one, and what are the downsides) Xtensa arch docs say both reads and writes are monitored, and it could work with up to a 64bytes range.

Machine-definition way seems to be related to -fstack-check as yet another instrumentation option https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fstack-check https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html https://www.adacore.com/uploads/techPapers/Stack_Analysis.pdf Which splits into either machine specific way or a library-implemented one.

Option adds extra code at the start of the func and checks whether sp stays within bounds. ref. In GCC code I see probe_stack_range has some calls to stack_check_libfunc, which in case of ADA is _gnat_stack_check. But it does not look like it is something overridable, like stack-protector func is https://github.com/gcc-mirror/gcc/search?q=probe_stack_range

(or any other GCC instrumentation does not seem to help; even patchable function entry will inject nops just before func sets up any stack)

Sep 02 '22 11:09 mcspr

First, any memory guard methods other than breakpoint only affect artifacts compiled with them enabled.
Second, breakpoint is powerless for memory access violations other than its specified "the point".
Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Sep 03 '22 13:09 jjsuwa-sys3175

Second, breakpoint is powerless for memory access violations other than its specified "the point".

True, but isn't it enough for functions that allocate some stack and try to operate it in the 'forbidden' region? Just another layer on top of existing debugging options.

Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Coherent error reporting, at the very least. Like in the mentioned error sanitizer example or the way stack-check does it. The point is not to recover, but to reduce the time from getting the error and noticing 'oops this is a stack overflow / smashing'

(which btw is also mis-named in StackThunk)

Sep 03 '22 13:09 mcspr

Second, breakpoint is powerless for memory access violations other than its specified "the point".

True, but isn't it enough for functions that allocate some stack and try to operate it in the 'forbidden' region? Just another layer on top of existing debugging options.

Be careful that memory regions can often be accessed discontinuously with stride of tens (or more) of bytes.

Last, as already pointed out, what should we do after detecting it? It's hard to do anything useful under conditions where the stack is exhausted, and of course virtually impossible to recover from that situation.

Coherent error reporting, at the very least.

And that is all, I guess. It is much better to be interrupted by the stack guard in a potentially problematic place than by the WDT in a completely unrelated one.

Sep 03 '22 14:09 jjsuwa-sys3175

Just one thing you might want to look into, just to be sure about whether the current "stack painting" is working as it should. The compiler may sometimes optimize away code like memset or memzero if the same memory address is going to be changed right after such a call. There are lots of issues about this to be found online. A number of bugs (or attack vectors) in various crypto related code often can be related to such issues where memory isn't actually cleared even though the code would suggest it should be.

Sep 03 '22 14:09 TD-er

The compiler may sometimes optimize away code like memset or memzero if the same memory address is going to be changed right after such a call. There are lots of issues about this to be found online.

Hmm, didn't GCC have a counterpart to SecureZeroMemory() in VC++? Well anyway, the portable way to deal with that kind of problem is to explicitly clear the region in its own code using volatile pointer.

Sep 03 '22 14:09 jjsuwa-sys3175

Well anyway, the portable way to deal with that kind of problem is to explicitly clear the region in its own code using volatile pointer.

In GCC, we can use extended asm to force dependency injection:

__builtin_memset (array, 0, sizeof (array));
__asm__ volatile (""::"m"(array));

Now GCC recognizes that all the elements of array after __builtin_memset() are referenced in the no-operation but volatile (cannot be optimized away) asm statement.

Sep 03 '22 15:09 jjsuwa-sys3175

I don't think we have that issue with 'repaints' themselves? It is a separate func call, plus code does not seem to optimize to memset and simply uses loop.

Notably, we have explicit_bzero in libc for memset-0 case https://github.com/earlephilhower/newlib-xtensa/blob/xtensa-4_0_0-lock-arduino/newlib/libc/string/explicit_bzero.c

Comparing some code, FreeBSD does not employ the any barriers https://www.leidinger.net/FreeBSD/dox/libkern/html/d5/da7/explicit__bzero_8c_source.html (edit: and there's a discussion - https://reviews.freebsd.org/D4964 - as to why not having it may cause LTO eliminate the call)

glibc source does https://code.woboq.org/userspace/glibc/string/explicit_bzero.c.html

Will update with the stack errors reporting, so cont and bssl attach our postmortem stack pointer error via this, plus rename to separate smash from overflow https://github.com/esp8266/Arduino/blob/313b3c07ecccbe6fee24aa9fa447c4aed16ca499/cores/esp8266/core_esp8266_postmortem.cpp#L315 https://github.com/esp8266/Arduino/blob/313b3c07ecccbe6fee24aa9fa447c4aed16ca499/cores/esp8266/core_esp8266_postmortem.cpp#L156-L157

btw I noticed stack watchpoint / breakpoint is also implemented in esp-idf for tasks stack. Each time they switch, this check is set up https://github.com/espressif/esp-idf/blob/fde4afc67a2035a1f461dc511cd85dde1ef90368/components/freertos/FreeRTOS-Kernel/portable/riscv/port.c#L368-L373 https://github.com/espressif/esp-idf/search?q=vPortSetStackWatchpoint But, will still need to think on how it is going to be enabled, though. Maybe something for later.

Sep 06 '22 11:09 mcspr

Arduino Arduino copied to clipboard

Evaluate stack guard implementation(s)

Arduino
Arduino copied to clipboard