swift-numerics
swift-numerics copied to clipboard
Initial pass at "relaxed" multiply and add operations.
This commit adds the following implementation hooks to the AlgebraicField protocol:
static func _relaxedAdd(_:Self, _:Self) -> Self
static func _relaxedMul(_:Self, _:Self) -> Self
These are equivalent to + and *, but have "relaxed semantics"; specifically, they license the compiler to reassociate them and to form FMA nodes, which are both significant optimizations that can easily make many common loops 8-10x faster. These transformation perturb results slightly, so they should not be enabled without care, but the results with the relaxed operations are--for most purposes--"just as good as" (and often better than) what strict operations produce. The main thing to beware of is that they are no longer portable; different compiler versions and different targets and optimization flags will result in different results.
These are then exposed under the Relaxed
namespace as:
Relaxed.sum(a, b)
Relaxed.product(a, b)
@swift-ci test
@swift-ci test
Hrm, why are we using a Swift-5.3.3 Linux toolchain for testing instead of something more recent? Still, good to know--if unfortunate--that reassociate(on) is not supported there. I'll have to add a workaround and a note for that.
@swift-ci test
@swift-ci test
@swift-ci test
@swift-ci test
@swift-ci test
Some quick perf numbers from my M1 laptop:
repeatedly summing 1024 Floats
time using reduce(0, +)
: 0.091 sec
time using reduce(0, Relaxed.sum)
: 0.009 sec
time using vDSP.sum
from Accelerate: 0.004 sec
repeated dot-product of 1024 Floats
time using reduce(0) { $0 + $1*$1 }
: 0.085 sec
time using reduce(0) { Relaxed.multiplyAdd($1, $1, $0)
: 0.011 sec
time using vDSP.sumOfSquares
from Accelerate: 0.005 sec
For "typical" reduction workloads as above, we see about a 10x speedup over the strict operators, and we're about 2x off of hand-written SIMD.