mathnet-numerics Repetitive calculations in statistical distribution classes

I have noticed that many statistical distribution classes duplicate code.

For example:

Instance version of PDF() does not call the static version with the fields passed as parameters
Samples() does not simply call Sample()

This violates the principle of Don't Repeat Yourself. Was this a deliberate decision or should the distributions be updated to minimize redundancy?

Jan 01 '14 00:01 Superbest

Continuous Distributions with redundant instance and static implementations:

Cauchy
Chi
ChiSquared
ContinuousUniform
Exponential
FisherSnedecor
InverseGamma
Laplace
LogNormal
Normal
Pareto
Rayleigh
Weibull

Continuous Distributions where the instance method redirects to the static implementation:

Beta
Erlang
Gamma
Stable
StudentT
Triangular

Jan 14 '14 21:01 cdrnet

Thanks for pointing out this somewhat gray area. The primary reason for having two versions is that the static one does have to do range checking while the instance one does not (as the distribution parameters have already been verified).

However, until verified by benchmarks this is a typical case of premature optimization. Branching can be very expensive, or negligible if the CPU's branch prediction works well. I'm happy to drop the duplications though if there is no significant difference between A and B (with CDF as example):

A: loop { acc += X.CDF(a, b, z); }
B: x = new X(a,b); loop { acc += x.CumulativeDistribution(z); }

Note that we have to expect these routines to be called from within an inner loop, so being 10% faster can justify some code duplication if the duplicated code is "short" and both cases are covered by tests.

Jan 14 '14 22:01 cdrnet

I think the classical solution to the range checking issue is this: Make a private, static method which does not range check. Then have the public instance method call the private static method, and have the public static method call the private static method after checking the range.

I can understand if you consider the resulting function bloat unacceptable, though. (I really only brought this up because I was curious myself, since I'm still learning C#)

I'll try to do the benchmark you describe.

Jan 15 '14 21:01 Superbest

Indeed, we do exactly that in most distributions for the random number sampling, with the private static SampleUnchecked functions. The situation is almost the same there as with PDF/CDF, except that Sampling is even more performance sensitive (since it is almost always called in a loop).

Jan 15 '14 21:01 cdrnet

mathnet-numerics mathnet-numerics copied to clipboard

Repetitive calculations in statistical distribution classes

mathnet-numerics
mathnet-numerics copied to clipboard