wasmtime
wasmtime copied to clipboard
Cranelift: shrink the size of `SigData` further
This could maybe give us some small perf gains due to a smaller working set that better fits in cache.
SigData is defined here: https://cs.github.com/bytecodealliance/wasmtime/blob/348f962d23df0a598ea80629ca6d8e4a158fe153/cranelift/codegen/src/machinst/abi.rs?q=SigData#L601-L627
The two changes we could make to shrink its size are:
-
We have implementation limits on the number of arguments and returns from a function, so we don't need a fulle
Range<u32>to represent the subslice inSigSet::abi_argsfor this signature's arguments and returns. Instead we could have au32start and au16length. Doing this for both args and returns would save a total of 4 bytes. -
SigData::stack_reg_argis currently anOption<usize>but, again because of implementation limits on the number of returns, could beOption<u16>. This would save 12 bytes for this field, but I think we might end up actually saving only 8 on the struct because of alignment.
For anyone who picks this up, we can measure the impact this change has via
- Building the bench API on
main:$ git checkout main $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/main.so - Building the bench API on the feature branch for this patch:
$ git checkout my-feature-branch $ cargo build --release -p wasmtime-bench-api $ cp target/release/libwasmtime_bench_api.so /tmp/smaller-sig-data.so - Running the sightglass benchmarks:
(You can also try passing$ cd ~/path/to/sightglass $ cargo run --release -- benchmark -e /tmp/main.so -e /tmp/smaller-sig-data.so --stop-after compilation -- benchmarks/default.suite--measure perf-countersto see the effects on cache accesses/misses if you're on linux.)
It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one SigData is always equal to the start-point in the next SigData, and the last SigData's end-point is the length of the corresponding backing array.
If we move all the SigData methods that take a SigSet to be on SigSet instead and take a Sig index—which I think is a good idea for API design reasons anyway—then it's easy to access the Sig+1 index.
Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the Option<usize> with a bool. If there are no constraints, we could still cut the field's size in half by getting rid of the Option, if we can pick a reserved value like u16::MAX to indicate that there's no stack-ret arg.
Also, do the sized_stack_*_space fields need to be i64? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?
All together, I think we can shrink SigData from 7 u64 to 5 u32, taking padding and alignment into account, for a savings of 64%.
It might also be possible to entirely remove the end-points of the argument and return ranges. I haven't checked carefully but I think the end-point in one
SigDatais always equal to the start-point in the nextSigData, and the lastSigData's end-point is the length of the corresponding backing array.If we move all the
SigDatamethods that take aSigSetto be onSigSetinstead and take aSigindex—which I think is a good idea for API design reasons anyway—then it's easy to access theSig+1index.
Yeah, this is possible but a larger change and probably not a good first issue anymore.
Are there any constraints on where the stack-ret arg can appear in the list? For example, if it's always the first argument, we could replace the
Option<usize>with abool. If there are no constraints, we could still cut the field's size in half by getting rid of theOption, if we can pick a reserved value likeu16::MAXto indicate that there's no stack-ret arg.
There aren't, but in practice I think it is always the first. Maybe we could just start enforcing that? Again, probably not in the realm of a good first issue though.
Also, do the
sized_stack_*_spacefields need to bei64? For one thing, what does it mean to reserve a negative amount of stack space? For another, surely 32 bits is enough for the size of a stack frame...?
Yeah I was kinda wondering that too. Haven't investigated it, so didn't want to include it in a good first issue.
I'll try out changing the SigData methods as suggested with Sigset
Let us know if you have any questions! I'm excited that you're digging into this.
May I ask what's still missing for this issue?
Good question! I don't remember for sure.
I think the only remaining possibility for shrinking this struct further is the Option<u16> storing stack_ret_arg, which is four bytes with one byte of padding in the middle. I think the isa::CallConv type is one byte, so with all the other fields being four bytes, the call_conv field has three bytes of padding. Therefore if we can pack the calling convention in with the return argument index, we can save four bytes.
However, I don't think that's a "good first issue". I'm not even sure it's worth doing.
So I'm going to go ahead and close this because I think it's basically done. Thanks for checking!