math
math copied to clipboard
Can boost::math::float_distance be sped up?
In the AGM PR, I have found that ~90% of the runtime is spent computing float distances. However, at least for float
and double
, the following trivial modification drops the runtime to a negligible fraction of the total runtime:
int32_t fast_float_distance(float x, float y) {
static_assert(sizeof(float) == sizeof(int32_t), "float is incorrect size.");
int32_t xi = *reinterpret_cast<int32_t*>(&x);
int32_t yi = *reinterpret_cast<int32_t*>(&y);
return yi - xi;
}
int64_t fast_float_distance(double x, double y) {
static_assert(sizeof(double) == sizeof(int64_t), "double is incorrect size.");
int64_t xi = *reinterpret_cast<int64_t*>(&x);
int64_t yi = *reinterpret_cast<int64_t*>(&y);
return yi - xi;
}
It seems like boost::math::float_distance
is considerably more general than this, but can we dive through a happy path to extract performance in the trivial cases?
Thats interesting! Does it pass the tests?
@jzmaddock : Yes; here's some background on the trick.
OK, but as pointed out in the article, your trick fails when the two inputs differ in sign (this includes when one input is zero). I also get a negative rather than positive result for say fast_float_distance(-1, -0.5)
. All of which can be fixed with some special case handling of course...
@jzmaddock : Yeah, the fact that agm
requires positive numbers (over the reals) simplifies the logic considerably.
The wins for float128 are pretty huge:
Without the fast float distance:
AGM<boost::multiprecision::float128> 8411 ns 8377 ns 82937
with it:
AGM<boost::multiprecision::float128> 2241 ns 2230 ns 313072
Implementation:
#ifdef BOOST_HAS_FLOAT128
__int128_t fast_float_distance(boost::multiprecision::float128 x, boost::multiprecision::float128 y) {
static_assert(sizeof(boost::multiprecision::float128) == sizeof(__int128_t), "double is incorrect size.");
__int128_t xi = *reinterpret_cast<__int128_t*>(&x);
__int128_t yi = *reinterpret_cast<__int128_t*>(&y);
return yi - xi;
}
#endif
I couldn't get it to work with long double
, sadly.
long double
is quite irritating sometimes. However it's also useful ;)