libm: optimize `fmod`
This is kind of a retry at rust-lang/compiler-builtins#898. One of the problems there was that it would have added overhead and regressed performance for typical inputs.
Unlike that PR, this doesn't aim for sub-linear scaling; the cost of evaluating fmod(x, y) is still roughly proportional to log2(|x/y|). However, the constant factor is much better. Running the random-benchmarks locally, I got walltime reductions of
fmodf16: -56.9%
fmodf: -85.0%
fmod: -95.4%
fmodf128: -98.7%
Needs rust-lang/compiler-builtins#1011 and rust-lang/compiler-builtins#1012
Sorry I haven't reviewed yet; I am excited about this but have been a bit short on time.
Any chance you're willing to commits to separate PRs? From a quick glance the first two look good, but I'll need a bit more time for the last one.
fmodf128: -98.7%
Seriously impressive :)
Yeah, I probably should have split it. I'm not home at this time, but I'll try to do that on Tuesday.
If possible, could you run the walltime benchmarks on something non-x86?
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.
Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.
Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.
Rebased and put the PR message into the commit since GH won't let me do that without squashing, I'll merge when CI completes.