phobos std.math.exponential: Implement log/log2/log10 for double- and float-precision

std.math.exponential: Implement log/log2/log10 for double- and float-precision

Open ibuclaw opened this issue 1 year ago • 4 comments

Software implementations are unchanged - only conditionally compiling in the log(x) = z + z^^3 R(z) / S(z) branch depending on the float format (as can be seen when reviewing with w=1).

Coefficients are taken from Cephes single and cmath libraries.

FYI @kinke.

Jul 06 '22 05:07 ibuclaw

Thanks for your pull request, @ibuclaw!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8499"

Jul 06 '22 05:07 dlang-bot

Testsuite failure will be fixed in dlang/dmd#14270.

Jul 06 '22 07:07 ibuclaw

Thanks for pushing this forward! This was one of the things on my personal backlog to tackle 😁

It's now time for LLVM/GCC intrinsics to shine.

Users who rely on implicit cast will see a slightly different value, due to precision conversion, right?

Jul 06 '22 10:07 ljmf00

Thanks for pushing this forward! This was one of the things on my personal backlog to tackle grin

It's now time for LLVM/GCC intrinsics to shine.

There's no intrinsic per say available for these functions. LLVM/GCC may recognize and compute them at compile-time, other than that, any __builtin_ versions of these would just forward to the mathlib, which may not satisfy D's pure requirements.

Users who rely on implicit cast will see a slightly different value, due to precision conversion, right?

If you were implicitly casting from an integer, you'll get a compile-time error that'll need fixing up, if you were implicitly converting from a float, you will indeed see a slightly different value.

Jul 06 '22 10:07 ibuclaw

Added deprecations for calling the logN family functions with an integral argument. That's the quickest way to get this over the line, rather than waiting for downstreams to fix their codebases.

Dec 11 '22 00:12 ibuclaw

@ibuclaw The documentation for log1p says "For very small x, log1p(x) will be more accurate than log(1 + x).". It looks to me that when calling log1p with doubles or floats that it basically just calls logImpl(1+x). I see some specialization in the real overload for the x87 that I don't really understand, but I could assume that it handles this greater accuracy part. Am I missing something, or does the documentation need to get improved here?

Feb 06 '23 17:02 jmh530

@ibuclaw The documentation for log1p says "For very small x, log1p(x) will be more accurate than log(1 + x).". It looks to me that when calling log1p with doubles or floats that it basically just calls logImpl(1+x). I see some specialization in the real overload for the x87 that I don't really understand, but I could assume that it handles this greater accuracy part. Am I missing something, or does the documentation need to get improved here?

That comment is true for x87, and was likely taken straight from the Intel manual for fyl2xp1.

For small epsilon (ε) values, more significant digits can be retained by using the FYL2XP1 instruction than by using (ε+1) as an argument to the FYL2X instruction.

Unless you're using DMD, you'll always be using the soft-based implementation, where that note never applies.

Feb 06 '23 18:02 ibuclaw

@ibuclaw The comment at the top of that file also says that the implementation of log1p is based on that from CEPHES, but it doesn't look that way to me (based on this webassembly version that I assume is the same code as the C one [1]). [`] https://github.com/nearform/node-cephes/blob/dcf7c538405d6e115bfaf8f5346b588daa4bd2b2/cephes/unity.c

Feb 06 '23 18:02 jmh530

@ibuclaw The comment at the top of that file also says that the implementation of log1p is based on that from CEPHES, but it doesn't look that way to me (based on this webassembly version that I assume is the same code as the C one [1]). [`] https://github.com/nearform/node-cephes/blob/dcf7c538405d6e115bfaf8f5346b588daa4bd2b2/cephes/unity.c

For the (originally ported) real implementation, both logl and log1pl are identical save for x = xm1 + 1.0L;, so the forwarding to logImpl(x+1) was a simplification. This has been carried over into the double and float ports too. There's likely a small optimization to be made if we were to instead port the double/float implementations of log1p from Cephes too, but we aren't loosing on inaccuracy by using the float/double port of log instead.

Feb 06 '23 19:02 ibuclaw

@ibuclaw I understanding the thinking. I'm just not happy with the documentation at the moment. I don't see a float implementation of log1p from Cephes, only double. I think glibc has one.

Feb 06 '23 19:02 jmh530

@ibuclaw If you use log1p(1.0e-15) you get a value of like 1.11e-15 instead of closer to 1e-15 and if you use log1p(1.0e-16) then the result is 0 instead of the same. What I'm not sure I appreciated enough is that the coefficients from CEPHES (at least for double) are in std.math.exponential (https://github.com/dlang/phobos/blob/6d647c582569390ba04b8c172456c8e3134e8c33/std/math/exponential.d#L2939). I think the function intends to provide more accuracy for log when it is close to 1, but it actually fails at doing that. The problem is that anyone who thinks they are actually getting that accuracy for log1p isn't actually getting it.

See: https://issues.dlang.org/show_bug.cgi?id=23750

Feb 27 '23 23:02 jmh530

I have a test generator, the output of I will raise a PR against phobos for, because as it turns out the number of unittests in phobos for the logN family is approximately 1 per function. Shocking.

This is the diff output of running it with an inlined version of log1p for SQRT1_2 <= x+1 <= SQRT2.

--- ../libphobos/dlog.d	2023-02-21 15:06:15.766105898 +0100
+++ ../libphobos/dlog1.d	2023-02-28 03:04:01.353245637 +0100
@@ -333,11 +333,11 @@ static foreach (alias logfn; AliasSeq!(l
         double[2][16] vals = [
             [double.nan, double.nan],[-double.nan, double.nan],
             [double.infinity, double.infinity], [-double.infinity, double.nan],
-            [double.min_normal, 0x0p+0], [-double.min_normal, 0x0p+0],
+            [double.min_normal, 0x1p-1022], [-double.min_normal, -0x1p-1022],
             [double.max, 0x1.62e42fefa39efp+9], [-double.max, double.nan],
-            [double.min_normal / 2, 0x0p+0], [-double.min_normal / 2, 0x0p+0],
+            [double.min_normal / 2, 0x0.8p-1022], [-double.min_normal / 2, -0x0.8p-1022],
             [double.max / 2, 0x1.628b76e3a7b61p+9], [-double.max / 2, double.nan],
-            [double.min_normal / 3, 0x0p+0], [-double.min_normal / 3, 0x0p+0],
+            [double.min_normal / 3, 0x0.5555555555555p-1022], [-double.min_normal / 3, -0x0.5555555555555p-1022],
             [double.max / 3, 0x1.6257909bce36ep+9], [-double.max / 3, double.nan],
         ];
         testLog1p(vals);

Feb 28 '23 02:02 ibuclaw

Difference between gdc (cephes) and dmd (inline_yl2x)

--- ../libphobos/cephes.d	2023-02-28 04:00:19.064218681 +0100
+++ ../libphobos/yl2x.d	2023-02-28 04:00:29.916500757 +0100
@@ -162,8 +162,8 @@ static void testLog(T)(T[2][] vals)
             [F(0x1p+0), F(0x0p+0)], [F(0x1p+1), F(0x1.62e42fefa39ef358p-1)], [F(0x1p+2), F(0x1.62e42fefa39ef358p+0)],
             [F(0x1p+3), F(0x1.0a2b23f3bab73682p+1)], [F(0x1p+4), F(0x1.62e42fefa39ef358p+1)], [F(0x1p+5), F(0x1.bb9d3beb8c86b02ep+1)],
             [F(0x1p+6), F(0x1.0a2b23f3bab73682p+2)], [F(0x1p+7), F(0x1.3687a9f1af2b14ecp+2)], [F(0x1p+8), F(0x1.62e42fefa39ef358p+2)],
-            [F(0x1p+9), F(0x1.8f40b5ed9812d1c2p+2)], [F(0x1p+10), F(0x1.bb9d3beb8c86b02ep+2)], [F(0x1p+11), F(0x1.e7f9c1e980fa8e98p+2)],
-            [F(0x1.8p+1), F(0x1.193ea7aad030a976p+0)], [F(0x1.4p+2), F(0x1.9c041f7ed8d336bp+0)], [F(0x1.cp+2), F(0x1.f2272ae325a57546p+0)],
+            [F(0x1p+9), F(0x1.8f40b5ed9812d1c4p+2)], [F(0x1p+10), F(0x1.bb9d3beb8c86b02ep+2)], [F(0x1p+11), F(0x1.e7f9c1e980fa8e98p+2)],
+            [F(0x1.8p+1), F(0x1.193ea7aad030a978p+0)], [F(0x1.4p+2), F(0x1.9c041f7ed8d336bp+0)], [F(0x1.cp+2), F(0x1.f2272ae325a57548p+0)],
             [F(0x1.ep+3), F(0x1.5aa16394d481f014p+1)], [F(0x1.1p+4), F(0x1.6aa6bc1fa7f79cfp+1)], [F(0x1.fp+4), F(0x1.b78ce48912b59f12p+1)],
             [F(0x1.08p+5), F(0x1.bf8d8f4d5b8d1038p+1)], [F(0x1.f8p+5), F(0x1.09291e8e3181b20ep+2)], [F(0x1.04p+6), F(0x1.0b292939429755ap+2)],
             [F(-0x0p+0), -F.infinity], [F(0x0p+0), -F.infinity], [F(0x1.388p+13), F(0x1.26bb1bbb5551582ep+3)],
@@ -234,11 +234,11 @@ static void testLog10(T)(T[2][] vals)
     {{
         F[2][24] vals = [
             [F(0x1p+0), F(0x0p+0)], [F(0x1p+1), F(0x1.34413509f79fef32p-2)], [F(0x1p+2), F(0x1.34413509f79fef32p-1)],
-            [F(0x1p+3), F(0x1.ce61cf8ef36fe6cap-1)], [F(0x1p+4), F(0x1.34413509f79fef32p+0)], [F(0x1p+5), F(0x1.8151824c7587eafep+0)],
-            [F(0x1p+6), F(0x1.ce61cf8ef36fe6cap+0)], [F(0x1p+7), F(0x1.0db90e68b8abf14cp+1)], [F(0x1p+8), F(0x1.34413509f79fef32p+1)],
+            [F(0x1p+3), F(0x1.ce61cf8ef36fe6ccp-1)], [F(0x1p+4), F(0x1.34413509f79fef32p+0)], [F(0x1p+5), F(0x1.8151824c7587eafep+0)],
+            [F(0x1p+6), F(0x1.ce61cf8ef36fe6ccp+0)], [F(0x1p+7), F(0x1.0db90e68b8abf14cp+1)], [F(0x1p+8), F(0x1.34413509f79fef32p+1)],
             [F(0x1p+9), F(0x1.5ac95bab3693ed18p+1)], [F(0x1p+10), F(0x1.8151824c7587eafep+1)], [F(0x1p+11), F(0x1.a7d9a8edb47be8e4p+1)],
-            [F(0x1.8p+1), F(0x1.e8927964fd5fd08cp-2)], [F(0x1.4p+2), F(0x1.65df657b04300868p-1)], [F(0x1.cp+2), F(0x1.b0b0b0b78cc3f296p-1)],
-            [F(0x1.ep+3), F(0x1.2d145116c16ff856p+0)], [F(0x1.1p+4), F(0x1.3afeb354b7d9731ap+0)], [F(0x1.fp+4), F(0x1.7dc9e145867e62eap+0)],
+            [F(0x1.8p+1), F(0x1.e8927964fd5fd08ep-2)], [F(0x1.4p+2), F(0x1.65df657b04300868p-1)], [F(0x1.cp+2), F(0x1.b0b0b0b78cc3f298p-1)],
+            [F(0x1.ep+3), F(0x1.2d145116c16ff858p+0)], [F(0x1.1p+4), F(0x1.3afeb354b7d9731cp+0)], [F(0x1.fp+4), F(0x1.7dc9e145867e62eap+0)],
             [F(0x1.08p+5), F(0x1.84bd545e4baeddp+0)], [F(0x1.f8p+5), F(0x1.cca1950e4511e192p+0)], [F(0x1.04p+6), F(0x1.d01b16f9433cf7b8p+0)],
             [F(-0x0p+0), -F.infinity], [F(0x0p+0), -F.infinity], [F(0x1.388p+13), F(0x1p+2)],
         ];
@@ -265,8 +265,8 @@ static void testLog10(T)(T[2][] vals)
             [double.max, 0x1.34413509f79ffp+8], [-double.max, double.nan],
             [double.min_normal / 2, -0x1.33f424bcb522p+8], [-double.min_normal / 2, double.nan],
             [double.max / 2, 0x1.33f424bcb522p+8], [-double.max / 2, double.nan],
-            [double.min_normal / 3, -0x1.3421390dcbe37p+8], [-double.min_normal / 3, double.nan],
-            [double.max / 3, 0x1.33c7106b9e609p+8], [-double.max / 3, double.nan],
+            [double.min_normal / 3, -0x1.3421e242e09fp+8], [-double.min_normal / 3, double.nan],
+            [double.max / 3, 0x1.33c6673689a51p+8], [-double.max / 3, double.nan],
         ];
         testLog10(vals);
     }
@@ -279,7 +279,7 @@ static void testLog10(T)(T[2][] vals)
             [real.max, 0x1.34413509f79fef32p+12L], [-real.max, real.nan],
             [real.min_normal / 2, -0x1.343c6405237810b2p+12L], [-real.min_normal / 2, real.nan],
             [real.max / 2, 0x1.343c6405237810b2p+12L], [-real.max / 2, real.nan],
-            [real.min_normal / 3, -0x1.343f354a34e427bp+12L], [-real.min_normal / 3, real.nan],
+            [real.min_normal / 3, -0x1.343f354a34e427b2p+12L], [-real.min_normal / 3, real.nan],
             [real.max / 3, 0x1.343992c0120bf9b2p+12L], [-real.max / 3, real.nan],
         ];
         testLog10(vals);
@@ -307,11 +307,11 @@ static void testLog1p(T)(T[2][] vals)
     static foreach (F; AliasSeq!(float, double, real))
     {{
         F[2][24] vals = [
-            [F(0x1p+0), F(0x1.62e42fefa39ef358p-1)], [F(0x1p+1), F(0x1.193ea7aad030a976p+0)], [F(0x1p+2), F(0x1.9c041f7ed8d336bp+0)],
+            [F(0x1p+0), F(0x1.62e42fefa39ef358p-1)], [F(0x1p+1), F(0x1.193ea7aad030a978p+0)], [F(0x1p+2), F(0x1.9c041f7ed8d336bp+0)],
             [F(0x1p+3), F(0x1.193ea7aad030a976p+1)], [F(0x1p+4), F(0x1.6aa6bc1fa7f79cfp+1)], [F(0x1p+5), F(0x1.bf8d8f4d5b8d1038p+1)],
             [F(0x1p+6), F(0x1.0b292939429755ap+2)], [F(0x1p+7), F(0x1.37072a9b5b6cb31p+2)], [F(0x1p+8), F(0x1.63241004e9010ad8p+2)],
-            [F(0x1p+9), F(0x1.8f60adf041bde2a8p+2)], [F(0x1p+10), F(0x1.bbad39ebe1cc08b6p+2)], [F(0x1p+11), F(0x1.e801c1698ba4395cp+2)],
-            [F(0x1.8p+1), F(0x1.62e42fefa39ef358p+0)], [F(0x1.4p+2), F(0x1.cab0bfa2a2002322p+0)], [F(0x1.cp+2), F(0x1.0a2b23f3bab73682p+1)],
+            [F(0x1p+9), F(0x1.8f60adf041bde2aap+2)], [F(0x1p+10), F(0x1.bbad39ebe1cc08b6p+2)], [F(0x1p+11), F(0x1.e801c1698ba4395ep+2)],
+            [F(0x1.8p+1), F(0x1.62e42fefa39ef358p+0)], [F(0x1.4p+2), F(0x1.cab0bfa2a2002324p+0)], [F(0x1.cp+2), F(0x1.0a2b23f3bab73682p+1)],
             [F(0x1.ep+3), F(0x1.62e42fefa39ef358p+1)], [F(0x1.1p+4), F(0x1.71f7b3a6b918664cp+1)], [F(0x1.fp+4), F(0x1.bb9d3beb8c86b02ep+1)],
             [F(0x1.08p+5), F(0x1.c35fc81b90df59c6p+1)], [F(0x1.f8p+5), F(0x1.0a2b23f3bab73682p+2)], [F(0x1.04p+6), F(0x1.0c234da4a23a6686p+2)],
             [F(-0x0p+0), F(-0x0p+0)], [F(0x0p+0), F(0x0p+0)], [F(0x1.388p+13), F(0x1.26bbed6fbd84182ep+3)],
@@ -335,11 +335,11 @@ static void testLog1p(T)(T[2][] vals)
         double[2][16] vals = [
             [double.nan, double.nan],[-double.nan, double.nan],
             [double.infinity, double.infinity], [-double.infinity, double.nan],
-            [double.min_normal, 0x1p-1022], [-double.min_normal, -0x1p-1022],
+            [double.min_normal, 0x0p+0], [-double.min_normal, 0x0p+0],
             [double.max, 0x1.62e42fefa39efp+9], [-double.max, double.nan],
-            [double.min_normal / 2, 0x0.8p-1022], [-double.min_normal / 2, -0x0.8p-1022],
+            [double.min_normal / 2, 0x0p+0], [-double.min_normal / 2, 0x0p+0],
             [double.max / 2, 0x1.628b76e3a7b61p+9], [-double.max / 2, double.nan],
-            [double.min_normal / 3, 0x0.5555555555555p-1022], [-double.min_normal / 3, -0x0.5555555555555p-1022],
+            [double.min_normal / 3, 0x0p+0], [-double.min_normal / 3, 0x0p+0],
             [double.max / 3, 0x1.6257909bce36ep+9], [-double.max / 3, double.nan],
         ];
         testLog1p(vals);
@@ -349,11 +349,11 @@ static void testLog1p(T)(T[2][] vals)
         real[2][16] vals = [
             [real.nan, real.nan],[-real.nan, real.nan],
             [real.infinity, real.infinity], [-real.infinity, real.nan],
-            [real.min_normal, 0x0p+0L], [-real.min_normal, 0x0p+0L],
+            [real.min_normal, 0x1p-16382L], [-real.min_normal, -0x1p-16382L],
             [real.max, 0x1.62e42fefa39ef358p+13L], [-real.max, real.nan],
-            [real.min_normal / 2, 0x0p+0L], [-real.min_normal / 2, 0x0p+0L],
+            [real.min_normal / 2, 0x0.8p-16382L], [-real.min_normal / 2, -0x0.8p-16382L],
             [real.max / 2, 0x1.62dea45ee3e064dcp+13L], [-real.max / 2, real.nan],
-            [real.min_normal / 3, 0x0p+0L], [-real.min_normal / 3, 0x0p+0L],
+            [real.min_normal / 3, 0x0.5555555555555556p-16382L], [-real.min_normal / 3, -0x0.5555555555555556p-16382L],
             [real.max / 3, 0x1.62db65fa664871d2p+13L], [-real.max / 3, real.nan],
         ];
         testLog1p(vals);

Feb 28 '23 03:02 ibuclaw

Tthis is my version log1pImpl that seems to work as I would expect. The logImpl version kind of assumes that the variable is around 1. When I tried to do a version could be called from both I was getting errors. In retrospect I think it could be fixed by assuming exp=0`, but I think this is a bit cleaner. The functionality to return the C1/C2 could be modified to be like LogCoeffs.

private T log1pImpl(T)(T x) @safe pure nothrow @nogc
{
    import std.math.algebraic: poly;
    import std.math.constants: SQRT2, SQRT1_2;
    import std.math.traits : isNaN, isInfinity, signbit;

    // Special cases.
    if (isNaN(x) || x == 0.0)
        return x;
    if (isInfinity(x) && !signbit(x))
        return x;
    if (x == -1.0)
        return -T.infinity;
    if (x < -1.0)
        return T.nan;
    if ((1.0 + x) > SQRT2 || (1.0 + x) < SQRT1_2)
        return logImpl(x + 1.0);

    alias F = floatTraits!T;
    alias coeffs = LogCoeffs!T;

    static if (F.realFormat == RealFormat.ieeeExtended ||
               F.realFormat == RealFormat.ieeeExtended53 ||
               F.realFormat == RealFormat.ieeeQuadruple)
    {
        // C1 + C2 = LN2.
        enum T C1 = 6.93145751953125E-1L;
        enum T C2 = 1.428606820309417232121458176568075500134E-6L;
    }
    else static if (F.realFormat == RealFormat.ieeeDouble)
    {
        enum T C1 = 0.693359375;
        enum T C2 = -2.121944400546905827679e-4;
    }
    else static if (F.realFormat == RealFormat.ieeeSingle)
    {
        enum T C1 = 0.693359375;
        enum T C2 = -2.12194440e-4;
    }
    else
        static assert(0, "Not implemented for this architecture");

    T y;
    T z = x * x;
    static if (F.realFormat == RealFormat.ieeeSingle)
        y = x * (z * poly(x, coeffs.logP));
    else
        y = x * (z * poly(x, coeffs.logP) / poly(x, coeffs.logQ));
    return x - 0.5 * z + y;
}

Feb 28 '23 03:02 jmh530

Tthis is my version log1pImpl that seems to work as I would expect. The logImpl version kind of assumes that the variable is around 1. When I tried to do a version could be called from both I was getting errors. In retrospect I think it could be fixed by assuming exp=0`, but I think this is a bit cleaner. The functionality to return the C1/C2 could be modified to be like LogCoeffs.

Right, I just added the following block for doubles.

     static if (floatTraits!T.realFormat == RealFormat.ieeeDouble)
     {
         // When the input is within the range 1/sqrt(2) <= x+1 <= sqrt(2), compute
         // log1p inline. Forwarding to log() would otherwise result in inaccuracies.
         const T xp1 = x + 1.0;
         if (xp1 >= SQRT1_2 && xp1 <= SQRT2)
         {
             alias coeffs = LogCoeffs!T;

             T px = poly(x, coeffs.logp1P);
             T qx = poly(x, coeffs.logp1Q);
             const T xx = x * x;
             qx = x + ((cast(T) -0.5) * xx + x * (xx * px / qx));
             return qx;
         }
     }
 
     return logImpl(x + 1.0);

I see that the thinko is that log1p reuses the original x if it is a small number, otherwise the +1.0 would destroy its value.

Feb 28 '23 04:02 ibuclaw

Not exactly sure how to read the test output you have, but the inline D version is done with reals, correct?

Feb 28 '23 04:02 jmh530

You're right that the +1.0 would destroy its value. That's why log1p as a function exists. Not sure why you would remove the part that handles floats, but I didn't test for that.

Feb 28 '23 04:02 jmh530

Not exactly sure how to read the test output you have, but the inline D version is done with reals, correct?

There are some cases where the inline yl2x branch is inaccurate. I checked against wolfram and cephes result matches the most number of digits.
log1p(double) for the inline yl2x branch is broken for really small numbers.
log1p(real) for the cephes branch is broken for really small numbers.

This is the gist of what I'm generating, just generates a bunch of unittest blocks to put into phobos once we've sorted out the problems.

https://pastebin.com/ceNJuyqL

Feb 28 '23 04:02 ibuclaw

phobos phobos copied to clipboard

std.math.exponential: Implement log/log2/log10 for double- and float-precision

Bugzilla references

Testing this PR locally

phobos
phobos copied to clipboard