julia icon indicating copy to clipboard operation
julia copied to clipboard

Revert "Don't expose guard pages to malloc_stack API consumers"

Open vtjnash opened this issue 1 year ago • 8 comments

Reverts JuliaLang/julia#54591

This cause the runtime to misbehave and crash, since all of the consumers of this information in the runtime assumed that the guard pages are accounted for correctly as part of the reserved allocation. Nothing in the runtime ever promised that it is valid to access the pages beyond the current redzone (indeed, ASAN would forbid it as well).

vtjnash avatar Aug 21 '24 15:08 vtjnash

Cc @fingolfin

lgoettgens avatar Aug 21 '24 21:08 lgoettgens

I have a hard time understanding the PR description, probably because I am misunderstanding something fundamental, sorry :-/. So some questions:

This cause the runtime to misbehave and crash, since all of the consumers of this information in the runtime

What are these consumers? The description makes it sound as if everything should burn and explode, but at least the Julia test suite seemed to pass.

assumed that the guard pages are accounted for correctly as part of the reserved allocation.

I guess what is "correct" depends on the definition of the interface. I read this as saying: "the semantics of malloc_stack and free_stack changed, but some code relied on the old semantics and got broken by the patch". Is that it?

Nothing in the runtime ever promised that it is valid to access the pages beyond the current redzone (indeed, ASAN would forbid it as well).

I have trouble following you at all here. Access which pages? Which redzones (is that ASAN terminology)?

Anyway: if this is reverted, then I'll dust of my old alternative patch, which modifies jl_active_task_stack to remove the guard page.

Just to explain, the goal of the removed patch was to to simplify scanning task stacks (for a conservative GC) for references to objects. For that we want to stop before hitting the guard page.

fingolfin avatar Aug 22 '24 13:08 fingolfin

(To be clear, I am not objecting to the reversal, I just would like to understand to make sure there is not some deeper issue in my stack scanning code)

fingolfin avatar Aug 22 '24 13:08 fingolfin

The task scanning should stop once it hits the current stack pointer, and never go all of the way to reaching these addresses. The code elsewhere detects if faults occur on the stack and treats them differently than other segfaults.

vtjnash avatar Aug 22 '24 16:08 vtjnash

The redzone is the number of bytes past the current stack pointer that are also used by the function (https://en.wikipedia.org/wiki/Red_zone_(computing))

vtjnash avatar Aug 22 '24 16:08 vtjnash

Yes, I agree --- you should only scan up to the actual stack pointer, and excluding guard pages was just a hack to attempt to make scanning work without knowing the stack pointer.

JeffBezanson avatar Aug 22 '24 21:08 JeffBezanson

Also, 55555 55555 55555 55555 55555 !

JeffBezanson avatar Aug 22 '24 21:08 JeffBezanson

We'd be happy to only scan to the stack pointer, but AFAIK so far it isn't available, or is it now? We've performed the conservative scanning of task stacks for many years now, and it worked well enough. That said, if the SP is available (how?), then from my perspective ideally jl_active_task_stack would be adjusted to use it to adjust its return values

fingolfin avatar Aug 23 '24 03:08 fingolfin

All the code in base (e.g. the unmap here and signal handling elsewhere) wants to know the full size of the memory region being managed, based upon the observed need to add jl_guard_size to every use of the size and base pointer of the region (lines diff here)

vtjnash avatar Aug 29 '24 18:08 vtjnash