dmd icon indicating copy to clipboard operation
dmd copied to clipboard

[big feature] Implement core.math.pow() for ^^ operator

Open ibuclaw opened this issue 2 years ago • 2 comments

Adds two new functions for the power operator:

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Int!Y);

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Float!Y);

Lowers x ^^ y operator to core.math.pow(x, y).

Self assessment score:

  • 2/10 readability

  • 0/10 maintainability

  • 10/10 speed (so long as you're using 64-bit doubles)

  • 42/10 compiler+druntime changed together in one PR.

Implementation

The integral power function is pretty much taken from the std.math template. Where it differs is rather than having two separate templates for pow(int, int) and pow(float, int), these have been merged into one pow(int_or_float, int) function.

The floating point power function similarly merges the pow(int, float) and pow(float, float) templates into one pow(int_or_float, float) function. It is otherwise a complete reworking, as the std.math implementation fails to deliver on two fronts:

  1. It computes at real precision only.
  2. It implements pow as sign * exp2(y * log2(x)) or (sign * exp2(fyl2x(x, y))), which is not accurate in practice.

The new implementation is split up into two (or three) different algorithms:

  • The 64-bit double and real (32-bit floats are forced at double precision) path is a port of the widely used algorithm described in "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991 (LGPL).
  • The 80-bit and 128-bit real path is a port of (another widely used) algorithm in Cephes by Stephen L. Moshier, 1984-1992.
  • The 128-bit IBM real path is a variant on the Cephes path, but with different overflow constraints and coefficients to work with the weird format. (This is still TODO/WIP)

Unittesting

The unittests in std.math are OK, though have an affinity towards real precision testing. Most can be hoisted into core.math and tweaked to validate all types, i.e: foreach (T; AliasSeq!(float, double, real)).

Note: some tests in std.math are just plain wrong, thankfully they only concern the special case handling, i.e pow(1, nan) == 1 (std.math asserts the result is nan).

Benchmarking

Naive test only confirms what we already know about x87 speeds (note, std.math.pow only computes at real precision)

{
    import core.math;
    pragma(inline, false)
    void f1() { cast(void)pow(123, 789); }
    pragma(inline, false)
    void f2() { cast(void)pow(123.456, 789); }
    pragma(inline, false)
    void f3() { cast(void)pow(123, 789.012); }
    pragma(inline, false)
    void f4() { cast(void)pow(123.456, 789.012); }
    pragma(inline, false)
    void f5() { cast(void)pow(123.456L, 789); }
    pragma(inline, false)
    void f6() { cast(void)pow(123, 789.012L); }
    pragma(inline, false)
    void f7() { cast(void)pow(123.456L, 789.012L); }
    import std.math : stdpow = pow;
    pragma(inline, false)
    void f8() { cast(void)stdpow(123, 789); }
    pragma(inline, false)
    void f9() { cast(void)stdpow(123.456, 789); }
    pragma(inline, false)
    void f10() { cast(void)stdpow(123, 789.012); }
    pragma(inline, false)
    void f11() { cast(void)stdpow(123.456, 789.012); }

    import std.datetime.stopwatch;
    auto r = benchmark!(f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11)(100_000_000);
    import std.stdio;
    writeln("corepow(int, int)      : ", r[0]);
    writeln("corepow(double, int)   : ", r[1]);
    writeln("corepow(int, double)   : ", r[2]);
    writeln("corepow(double, double): ", r[3]);
    writeln("corepow(real, int)     : ", r[4]);
    writeln("corepow(int, real)     : ", r[5]);
    writeln("corepow(real, real)    : ", r[6]);
    writeln("stdpow(int, int)       : ", r[7]);
    writeln("stdpow(double, int)    : ", r[8]);
    writeln("stdpow(int, double)    : ", r[9]);
    writeln("stdpow(double, double) : ", r[10]);
}
__EOF__
corepow(int, int)      : 656 ms, 103 μs, and 7 hnsecs
corepow(double, int)   : 636 ms, 773 μs, and 3 hnsecs
corepow(int, double)   : 963 ms, 588 μs, and 2 hnsecs
corepow(double, double): 960 ms, 462 μs, and 2 hnsecs
corepow(real, int)     : 769 ms and 272 μs
corepow(int, real)     : 19 secs, 487 ms, 782 μs, and 4 hnsecs
corepow(real, real)    : 19 secs, 665 ms, 617 μs, and 1 hnsec
stdpow(int, int)       : 754 ms, 980 μs, and 8 hnsecs
stdpow(double, int)    : 785 ms, 29 μs, and 1 hnsec
stdpow(int, double)    : 16 secs, 560 ms, 262 μs, and 7 hnsecs
stdpow(double, double) : 20 secs, 161 ms, 376 μs, and 3 hnsecs

Tested on:

  • [ ] x86_64-Linux-gnu (LE, 80-bit)
  • [ ] powerpc64-linux-gnu (BE, ibm)
  • [ ] aarch64-linux-gnu (LE, 128-bit)
  • [ ] i686-freebsd13 (LE, 80-bit rounded)
  • [ ] sparc64-sun-solaris2.11 (BE, 128-bit)
  • [ ] mips-linux-gnu (LE, 64-bit)

ibuclaw avatar Jul 12 '22 22:07 ibuclaw

Thanks for your pull request, @ibuclaw!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#14297"

dlang-bot avatar Jul 12 '22 22:07 dlang-bot

Adds two new functions for the power operator:

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Int!Y);

    typeof(X * Y) pow(X, Y)
	if (IntOrFloat!X && Float!Y);

Lowers x ^^ y operator to core.math.pow(x, y).

Self assessment score:

* **2/10** readability

* **0/10** maintainability

* **10/10** speed (so long as you're using 64-bit doubles)
corepow(int, int)      : 656 ms, 103 μs, and 7 hnsecs
corepow(double, int)   : 636 ms, 773 μs, and 3 hnsecs
corepow(int, double)   : 963 ms, 588 μs, and 2 hnsecs
corepow(double, double): 960 ms, 462 μs, and 2 hnsecs
corepow(real, int)     : 769 ms and 272 μs
corepow(int, real)     : 19 secs, 487 ms, 782 μs, and 4 hnsecs
corepow(real, real)    : 19 secs, 665 ms, 617 μs, and 1 hnsec
stdpow(int, int)       : 754 ms, 980 μs, and 8 hnsecs
stdpow(double, int)    : 785 ms, 29 μs, and 1 hnsec
stdpow(int, double)    : 16 secs, 560 ms, 262 μs, and 7 hnsecs
stdpow(double, double) : 20 secs, 161 ms, 376 μs, and 3 hnsecs
* **42/10** compiler+druntime changed together in one PR.

over what test program/benchmark + compiler settings etc.

maxhaton avatar Jul 13 '22 00:07 maxhaton