NyuziToolchain
NyuziToolchain copied to clipboard
Vararg functions don't work correctly without prototype
A vararg function pushes all parameters on the stack. The problem is that, if a prototype is not provided, the compiler uses the normal ABI and puts the parameters in registers so they are not read correctly by the called function.
Redesign ABI so it works correctly in this case.
MIPS deals with this by reserving space in the call frame for four arguments. The called function then copies the register values into these slots so it can have a contiguous array of all arguments. The challenge on Nyuzi is that it can pass up to 8 arguments in registers, and some of those can be vector registers. This would result in a lot of wasted stack space for every call if done the same way.
FWIW, that's precisely how the arm64 ABI works. Passes the first 8 int and and first 8 quad fp args via registers, and the first thing the variadic function does is dump both sets down to the stack. Seems to right off the bat use up 224 bytes on the stack.
x86-64 has an ABI hack to pass a boolean in I believe eax to signify if any fp args are in use, which saves the trouble of having the callee having to splat down a bunch of SSE registers.
Reserving space for 8 vector registers on this architecture would take 8*64 = 512 bytes per call, which seemed like a lot to me, but not that much more than 224. I guess the ARM folks figured that was okay. A problem with this approach given the way the Nyuzi ABI is defined is that the called function doesn't know in which order scalar and vector registers are used relative to each other. For example, if a function is called with:
func(vec, scalar, vec, vec, scalar)
The parameters will be put in the following registers:
func(v0, s0, v1, v2, s1)
Call the same function as:
func(scalar, vec, vec, vec, scalar)
It is passed as:
func(s0, v0, v1, v2, s1)
So the called function can't really know whether to copy s0 or v0 into the first slot on the stack.
Another potential approach I considered would be to make __builtin_vastart/__bulitin_vaarg smarter. va_list would contain a bit more information:
struct va_list {
int scalar_reg_index;
int vector_reg_index;
void *stack_arg_base;
};
va_start might alloca a space on the stack and blat all the registers there, scalars first, then vectors, then va_arg would check:
if (is_vector) {
if (scalar_reg_index++ < MAX_SCALAR_ARGS) {
// copy argument from saved scalar register area
} else {
// copy from stack_arg_base and update stack_arg_base
}
} else {
if (vector_reg_index++ < MAX_SCALAR_ARGS) {
// copy argument from saved vector register area
} else {
// copy from stack_arg_base and update stack_arg_base
}
}
The ABI could reserve a register to indicate if the call is vararg, but that would be wasted for every call (99% of which are not var arg), and there would be extra overhead to set up the register for every call.
The other approach I've been considering has a simpler implementation :)
SDValue NyuziTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const {
if (CLI.IsVarArg)
llvm_unreachable("No!");