julia icon indicating copy to clipboard operation
julia copied to clipboard

WIP: Try using preserve_none for setjmp

Open Keno opened this issue 4 months ago • 1 comments

The preserve_none calling convention is a new calling convention in clang (>= 19) and gcc that preserves a more minimal set of registers (rsp, rbp on x86_64; lr, fp on aarch64). As a result, if this calling convention is used with setjmp, those registers do not need to be stored in the setjmp buffer, allowing us to reduce the size of this buffer and use fewer instructions to save the buffer. The tradeoff of course is that these registers may need to be saved anyway, in which case both the stack usage and the instructions just move to the caller (which is strictly worse). It is not clear that this is useful for exceptions (which already have a fair bit of state anyway, so even in the happy path the savings are not necessarily that big), but I am thinking about using it for #60281, which has different characteristics, so this is an easy way to try out whether there are any unexpected challenges.

Note that preserve_none is a very recent compiler feature, so most compilers out there do not have it yet. For compatibility, this PR supports using different jump buffer formats in the runtime and the generated code.

Keno avatar Dec 05 '25 00:12 Keno

If the function doesn't use all the registers available to it, does this potentially make the setjmp cheaper since it won't save useless information? I doubt this is true on x86 since it has so few registers, but it might be true on aarch64. I've seen setjmp taking a couple % of the runtime on aarch64 macOS when spawning lots and lots of tasks

gbaraldi avatar Dec 05 '25 01:12 gbaraldi

Yeah, the real pain is that not only are there way more callee-save GPRs (x19-x29 + sp) on aarch64, but the upper half of the SIMD registers are callee-save too (v8-v15). It's something like 232 bytes total, vs 64 on x86.

I'm a little sad that GCC doesn't seem to have an equivalent for LLVM preserve_allcc/coldcc, which I want for the slow path in PLT thunks.

xal-0 avatar Dec 11 '25 20:12 xal-0