Revert "Don't expose guard pages to malloc_stack API consumers"
Reverts JuliaLang/julia#54591
This cause the runtime to misbehave and crash, since all of the consumers of this information in the runtime assumed that the guard pages are accounted for correctly as part of the reserved allocation. Nothing in the runtime ever promised that it is valid to access the pages beyond the current redzone (indeed, ASAN would forbid it as well).
Cc @fingolfin
I have a hard time understanding the PR description, probably because I am misunderstanding something fundamental, sorry :-/. So some questions:
This cause the runtime to misbehave and crash, since all of the consumers of this information in the runtime
What are these consumers? The description makes it sound as if everything should burn and explode, but at least the Julia test suite seemed to pass.
assumed that the guard pages are accounted for correctly as part of the reserved allocation.
I guess what is "correct" depends on the definition of the interface. I read this as saying: "the semantics of malloc_stack and free_stack changed, but some code relied on the old semantics and got broken by the patch". Is that it?
Nothing in the runtime ever promised that it is valid to access the pages beyond the current redzone (indeed, ASAN would forbid it as well).
I have trouble following you at all here. Access which pages? Which redzones (is that ASAN terminology)?
Anyway: if this is reverted, then I'll dust of my old alternative patch, which modifies jl_active_task_stack to remove the guard page.
Just to explain, the goal of the removed patch was to to simplify scanning task stacks (for a conservative GC) for references to objects. For that we want to stop before hitting the guard page.
(To be clear, I am not objecting to the reversal, I just would like to understand to make sure there is not some deeper issue in my stack scanning code)
The task scanning should stop once it hits the current stack pointer, and never go all of the way to reaching these addresses. The code elsewhere detects if faults occur on the stack and treats them differently than other segfaults.
The redzone is the number of bytes past the current stack pointer that are also used by the function (https://en.wikipedia.org/wiki/Red_zone_(computing))
Yes, I agree --- you should only scan up to the actual stack pointer, and excluding guard pages was just a hack to attempt to make scanning work without knowing the stack pointer.
Also, 55555 55555 55555 55555 55555 !
We'd be happy to only scan to the stack pointer, but AFAIK so far it isn't available, or is it now? We've performed the conservative scanning of task stacks for many years now, and it worked well enough.
That said, if the SP is available (how?), then from my perspective ideally jl_active_task_stack would be adjusted to use it to adjust its return values
All the code in base (e.g. the unmap here and signal handling elsewhere) wants to know the full size of the memory region being managed, based upon the observed need to add jl_guard_size to every use of the size and base pointer of the region (lines diff here)