Call-saved registers
We spend a lot of instructions and bytes saving registers to the stack before calls and reloading them after. We could do less of this (exact amount is difficult to quantify) if some registers persisted across function calls and effectively belonged to the second-to-leaf function.
It would probably be easiest to handle the --gc=none case first, as this would only involve changes to where registers are saved (and for register allocation to prefer saved registers for values that need to live across a call). Stack walking becomes substantially more complicated with registers on the stack as the frame that stored them does not know whether they contain live pointers or not; it would be necessary for the stack walker to logically unwind the stack to reconstruct the values of saved registers in each parent frame.
Stretch goal: allow calling conventions to be different for different functions. _Gc for instance should save essentially everything so that the code size impact in the (very numerous) callers is minimized. This requires either a global map (ugly, potential impact on parallelism in compilationLib) or changing the representation of calls in IR (likely major knock-on proof effects).
Stretch goal: heuristically determine calling conventions for user-defined functions, aka "interprocedural register allocation".