mathnet-numerics Performance of SpecialFunctions.Erf and SpecialFunctions.Erfinv

Performance of SpecialFunctions.Erf and SpecialFunctions.Erfinv

Open QuantifEye opened this issue 3 years ago • 1 comments

I have used mathnet to provide the special functions Erf and ErfInv. They are however creating a bottleneck as I need the Error function to implement the Gaussian copula in the context of Monte Carlo simulation,

I have optimized the functions, by removing unnecessary local variables and redundant branches.

The error function and inverse error function relies heavily on Polynomial.Evaluate which was a big bottleneck. For example, many branches use the statement.

Polynomial.Evaluate(z, arrayN) / Polynomial.Evaluate(z, arrayD)

I have replaced these with various functions performing this manually. as below.

private static double ErfImplA(double z)
{
    const double n0 = 0.0033791670955125737;
    const double n1 = -0.0007369565304816795;
    const double n2 = -0.3747323373929196;
    const double n3 = 0.08174424487335873;
    const double n4 = -0.04210893199365486;
    const double n5 = 0.007016570951209575;
    const double n6 = -0.004950912559824351;
    const double n7 = 0.0008716465990379225;

    const double d0 = 1.0;
    const double d1 = -0.21808821808792464;
    const double d2 = 0.4125429727254421;
    const double d3 = -0.08418911478731067;
    const double d4 = 0.06553388564002416;
    const double d5 = -0.012001960445494177;
    const double d6 = 0.00408165558926174;
    const double d7 = -0.0006159007215577697;

    var n =
        FusedMultiplyAdd(
            FusedMultiplyAdd(
                FusedMultiplyAdd(
                    FusedMultiplyAdd(
                        FusedMultiplyAdd(
                            FusedMultiplyAdd(
                                FusedMultiplyAdd(n7,
                                    z, n6),
                                z, n5),
                            z, n4),
                        z, n3),
                    z, n2),
                z, n1),
            z, n0);
    var d =
        FusedMultiplyAdd(
            FusedMultiplyAdd(
                FusedMultiplyAdd(
                    FusedMultiplyAdd(
                        FusedMultiplyAdd(
                            FusedMultiplyAdd(
                                FusedMultiplyAdd(d7,
                                    z, d6),
                                z, d5),
                            z, d4),
                        z, d3),
                    z, d2),
                z, d1),
            z, d0);
    return n / d;
}

This however comes at the cost of multiplying the polynomials out by hand but the results are significant.

|                        Method |        Mean |      Error |     StdDev |      Median |
|------------------------------ |------------:|-----------:|-----------:|------------:|
|   ErfOriginalAllBranchAverage |   915.76 ns |  18.293 ns |  19.574 ns |   908.30 ns |
|  ErfOptimisedAllBranchAverage |   457.32 ns |   6.222 ns |   5.196 ns |   457.92 ns |
|  ErfOriginalHotBranchAverage1 |   237.21 ns |   2.324 ns |   2.060 ns |   236.85 ns |
| ErfOptimisedHotBranchAverage1 |   114.23 ns |   2.301 ns |   5.601 ns |   113.38 ns |
|  ErfOriginalHotBranchAverage2 |    70.52 ns |   1.440 ns |   3.692 ns |    69.29 ns |
| ErfOptimisedHotBranchAverage2 |    26.51 ns |   0.552 ns |   0.542 ns |    26.38 ns |
|                OriginalErfInv | 6,498.57 ns | 129.614 ns | 310.547 ns | 6,409.22 ns |
|               OptimisedErfInv | 2,592.28 ns |  49.925 ns |  46.700 ns | 2,577.26 ns |

The outputs are not identical to the current implementation, but are very close (1.1102230246251565E-16 in small test of error function).

I have attached my implementation and benchmark code.

I am not sure if this would be considered a breaking change. If it cannot be included in Mathnet, I will add it to my library instead.

See example code here.

Mar 17 '21 22:03 QuantifEye

Hi,

Could this behavior be platform specific? I am running your code sample on my MacBook Pro M1 (in Release mode), and get the following results. It seems that the library implementation is faster in all cases, am I reading this correctly?

// * Summary *

BenchmarkDotNet=v0.13.1, OS=macOS Monterey 12.3 (21E230) [Darwin 21.4.0]
Apple M1 2.40GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=6.0.103
  [Host]     : .NET 5.0.15 (5.0.1522.11506), X64 RyuJIT  [AttachedDebugger]
  DefaultJob : .NET 5.0.15 (5.0.1522.11506), X64 RyuJIT


|                        Method |         Mean |      Error |     StdDev |
|------------------------------ |-------------:|-----------:|-----------:|
|   ErfOriginalAllBranchAverage |    738.83 ns |   2.640 ns |   2.470 ns |
|  ErfOptimisedAllBranchAverage |  2,843.27 ns |  13.305 ns |  12.445 ns |
|  ErfOriginalHotBranchAverage1 |    224.54 ns |   0.684 ns |   0.640 ns |
| ErfOptimisedHotBranchAverage1 |  1,005.54 ns |   4.267 ns |   3.991 ns |
|  ErfOriginalHotBranchAverage2 |     55.91 ns |   0.080 ns |   0.075 ns |
| ErfOptimisedHotBranchAverage2 |    320.37 ns |   5.005 ns |   4.179 ns |
|                OriginalErfInv |  4,846.12 ns |  54.304 ns |  48.139 ns |
|               OptimisedErfInv | 37,130.96 ns | 365.730 ns | 342.104 ns |

// * Warnings *
Environment
  Summary -> Benchmark was executed with attached debugger

// * Hints *
Outliers
  Benchmarks.ErfOriginalHotBranchAverage2: Default  -> 2 outliers were detected (57.56 ns, 57.61 ns)
  Benchmarks.ErfOptimisedHotBranchAverage2: Default -> 2 outliers were removed (345.91 ns, 357.92 ns)
  Benchmarks.OriginalErfInv: Default                -> 1 outlier  was  removed (5.16 us)

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 ns   : 1 Nanosecond (0.000000001 sec)

// ***** BenchmarkRunner: End *****
// ** Remained 0 benchmark(s) to run **
Run time: 00:02:35 (155.5 sec), executed benchmarks: 8

Global total time: 00:02:46 (166.84 sec), executed benchmarks: 8
// * Artifacts cleanup *

Apr 10 '22 21:04 jkalias

mathnet-numerics mathnet-numerics copied to clipboard

Performance of SpecialFunctions.Erf and SpecialFunctions.Erfinv

mathnet-numerics
mathnet-numerics copied to clipboard