feat: optimize frame layout for tail-call-only functions
Reduce frame size from 16 to 8 bytes for functions that only make tail calls (FunctionCalls::TailOnly). This optimization:
@cfallin What do you think of something like this? I only looked into aarch64 for the moment since other ISAs such as x64 s390x looks quite different and more complex to implement.
Unfortunately I don't think this is going to work: the stack pointer has to be 16-aligned, and aarch64 will actually trap if memory accesses occur with a misaligned SP.
Furthermore the savings I would expect is not "only push FP, not LR", but "don't push anything at all if the frame is zero-size". This should be the case for tail-calling functions with. no stack storage (spillslots, stackslots or clobbers) and no outgoing argument space.
Don't debuggers rely on frame pointers for stack traces? Could setting the frame size to 0 hurt debugging/unwinding?
Debuggers and profilers should handle missing stack frames for leaf functions already. And besides debuggers actually generally use .eh_frame for stack unwinding, only falling back to frame pointers when .eh_frame is not available.
Right -- we already omit frame pointers for functions that are truly leaf functions (no calls at all, with no frame storage); this is a common optimization.
In Wasmtime, where we use our own stack-walking logic and unwinder and want simplicity/robustness, we configure Cranelift never to omit frame pointers; so this optimization largely applies to other uses of Cranelift, like bjorn3's cg_clif.
Then could it be safe to have something like this?
// Compute linkage frame size.
let setup_area_size = if flags.preserve_frame_pointers()
// The function arguments that are passed on the stack are addressed
// relative to the Frame Pointer.
|| flags.unwind_info()
|| incoming_args_size > 0
|| clobber_size > 0
|| fixed_frame_storage_size > 0
{
16 // FP, LR
} else {
match function_calls {
FunctionCalls::Regular => 16,
FunctionCalls::None => 0,
- FunctionCalls::TailOnly => 8,
+ FunctionCalls::TailOnly => 0,
}
};
I think you'll want to check the tail args and outgoing args size as well (the other parameters to compute_frame_layout) -- basically, if any part of the frame needs to exist, then we need to do the FP setup even if we only have tail calls.