math icon indicating copy to clipboard operation
math copied to clipboard

Can boost::math::float_distance be sped up?

Open NAThompson opened this issue 4 years ago • 5 comments

In the AGM PR, I have found that ~90% of the runtime is spent computing float distances. However, at least for float and double, the following trivial modification drops the runtime to a negligible fraction of the total runtime:

    int32_t fast_float_distance(float x, float y) {
        static_assert(sizeof(float) == sizeof(int32_t), "float is incorrect size.");
        int32_t xi = *reinterpret_cast<int32_t*>(&x);
        int32_t yi = *reinterpret_cast<int32_t*>(&y);
        return yi - xi;
    }

    int64_t fast_float_distance(double x, double y) {
        static_assert(sizeof(double) == sizeof(int64_t), "double is incorrect size.");
        int64_t xi = *reinterpret_cast<int64_t*>(&x);
        int64_t yi = *reinterpret_cast<int64_t*>(&y);
        return yi - xi;
    }

It seems like boost::math::float_distance is considerably more general than this, but can we dive through a happy path to extract performance in the trivial cases?

NAThompson avatar May 28 '20 14:05 NAThompson

Thats interesting! Does it pass the tests?

jzmaddock avatar May 28 '20 16:05 jzmaddock

@jzmaddock : Yes; here's some background on the trick.

NAThompson avatar May 28 '20 17:05 NAThompson

OK, but as pointed out in the article, your trick fails when the two inputs differ in sign (this includes when one input is zero). I also get a negative rather than positive result for say fast_float_distance(-1, -0.5). All of which can be fixed with some special case handling of course...

jzmaddock avatar May 28 '20 17:05 jzmaddock

@jzmaddock : Yeah, the fact that agm requires positive numbers (over the reals) simplifies the logic considerably.

The wins for float128 are pretty huge:

Without the fast float distance:

AGM<boost::multiprecision::float128>       8411 ns         8377 ns        82937

with it:

AGM<boost::multiprecision::float128>       2241 ns         2230 ns       313072

Implementation:

#ifdef BOOST_HAS_FLOAT128
    __int128_t fast_float_distance(boost::multiprecision::float128 x, boost::multiprecision::float128 y) {
        static_assert(sizeof(boost::multiprecision::float128) == sizeof(__int128_t), "double is incorrect size.");
        __int128_t xi = *reinterpret_cast<__int128_t*>(&x);
        __int128_t yi = *reinterpret_cast<__int128_t*>(&y);
        return yi - xi;
    }
#endif

I couldn't get it to work with long double, sadly.

NAThompson avatar May 28 '20 18:05 NAThompson

long double is quite irritating sometimes. However it's also useful ;)

cosurgi avatar Jun 05 '20 15:06 cosurgi