RFC stack bounds checking
I am opening this RFC because the topic has been brought up by @rezan in general on IRC and in this PR in particular https://github.com/varnish/varnish-modules/pull/180
I am making an effort to try to be fair to everyone involved. If you think that I misrepresented your opinion, please assume good intentions from my end and help by clarifying your opinion.
State of Affairs
varnish-cache currently has no stack bounds checking. Most operating systems have a guard page at the logical top of the stack (which in most cases is the virtual address bottom when the stack grows downwards), so that a stack overflow will trigger a segmentation fault, which is handled by our panic handler. This in turn has a pretty trivial heuristic to give advise to users about stack overflows, which is also documented in the troubleshooting guide.
We have a canary vtc to test if our default workspace and stack size settings make any sense at all on 64bit for what might be considered a reasonable call depth.
We routinely look after adjustments of the defaults when we notice any issues and at least from my perspective the number of tickets regarding stack overflows has come down to virtually zero.
So, while we can not guarantee that stack overflows will not happen, we have taken care that they are unlikely and, if they happen, users know what to do about them.
PCRE JIT: Where stack overflows are relevant
I am making the assumption that all production installations use PCRE JIT for performance reasons. It is enabled by default.
PCRE JIT has the advantage of requiring a fixed amount of stack in contrast to PCRE without JIT, which uses arbitrary amounts in theory, but can be limited by pcre_match_limit_recursion.
PCRE JIT uses a fixed size of 32k on the stack. Thus, for all practical purposes, stack overflows are almost guaranteed to happen in PCRE, when they happen. Most likely they will happen when VCL already used relevant amounts of stack for VCL subroutine calls.
Why does PCRE JIT need to use the machine stack?
It does not.
Alternatives have been discussed at length here and in particular here. Please make sure you do read these threads before you get into details of these questions.
The short version is that the official API way is pcre_assign_jit_stack(), which requires the stack to be allocated by pcre_jit_stack_alloc(). Following the code to sljit_allocate_stack shows that this is basically a mmap().
As we would never want to add a memory mapping for each pcre call, we would need to maintain a pool for pcre, which has been suggested in one of the emails referenced above.
But in the end keeping the JIT stack on the machine stack is plain and simple and I hope it is fair to summarize that it was generally perceived that having an appropriately sized machine stack is generally easier and the returns of the additional overhead of maintaining a PCRE JIT pool were probably not worth the efforts.
Where stack overflows are unlikely to happen
Conversely to where stack overflows are relevant, they are virtually irrelevant anywhere else, in particular in vmods. Unless vmods call into pcre, for all practical purposes, they can assume 32k of stack to be available.
Should we handle stack overflows at all?
IMHO this is the most important question to answer first.
Ideal stack bounds checking would fail individual requests. These failures would need to be noticed by an administrator, who would need to either reduce the required stack space (probably by reducing the function call depth) or dial up the stack space parameter.
Yes, we would not panic varnish. But we would still fail requests. Potentially all of them. From a practical perspective, is this actually better?
How would we handle stack overflows?
Unless I overlook any other options, there are two:
Know how much stack there is left and check for it before using any
The question how this would be done is a complication by itself which I would like to leave out for now for the sake of simplicity. Let's just assume there was a stack_free() function or macro which told us how much stack was left.
For full stack overflow protection, we would need to call stack_free() in every function which is to make other function calls, and compare the result to an estimate (upper bound) of how much stack space any function called by that function needs.
As this is highly impractical, I think this approach only makes sense to check particular cases like pcre calls or potentially larger stack allocations (this is the case discussed in https://github.com/varnish/varnish-modules/pull/180).
If such a practical approach was taken, we still would not guarantee to be free of stack overflows. We would just make them less likely by covering specific cases which we consider relevant.
Handle the segmentation fault
This involves setjmp() and longjmp(). We would detect the segmentation fault, detect that it is a stack overflow, log some error, increase a counter and bail out from down the stack to some safe point, which would error the current request.
While this might sound simple in theory, things get very messy very soon: What do we do about locks being potentially held? What about data structures? File descriptors? What if we were in some custom code we do not even have control over?
IIRC, @bsdphk simply wiped this off the table with a "no, we ware not going there", and, IMHO, very rightly so. longjmp-land is a place I would never want to go to.
so what?
So, IMHO, the only option would be Know how much stack there is left and check for it before using any with the limitations mentioned above.
We would still need a way to determine the stack limits and probably find either good libraries (cough libunwind cough) to help us with portability or do the job ourselves. And we would probably need to limit support of the stack checking feature to configurations which use contiguous stacks only.
But given all that, my personal opinion is that such a feature even with limited usability would still be worth it.
And actually we did have a prototype before. It was removed in 1a78b8022a9152b36c0994053d743e9bf4d927f9 after it was not picked up by anyone, probably in particular not by me. My understanding is that @bsdphk thought I wanted it for vmods, while I thought the main use case was built-in regular expressions. But having checks in core code was something he did not like. (Again, if I misrepresented your opinion, phk, please correct me).
As has probably become obvious, I submitted the first version of this ticket too early by accident. I hope to have fixed most mistakes from the initial writeup now.
bugwash:
- pcre2 might offer a way for stack bounds checking
- We want to switch to pcre2 #3559
This ticket has served its purpose and is not actionable any more