cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Update statistics.py Faster implementation for normal distribution

Open ThibaultDECO opened this issue 1 year ago • 2 comments
trafficstars

Faster implementation for normal distribution

ThibaultDECO avatar May 11 '24 22:05 ThibaultDECO

All commit authors signed the Contributor License Agreement.
CLA signed

cpython-cla-bot[bot] avatar May 11 '24 22:05 cpython-cla-bot[bot]

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app[bot] avatar May 11 '24 22:05 bedevere-app[bot]

@ThibaultDECO Your PR needs an issue describing the bigfix or inprovements. See https://devguide.python.org/getting-started/pull-request-lifecycle/#quick-guide

About the PR: could you provide benchmarks showing how much this improves performance? I suspect that at least some of the changes have no impact

eendebakpt avatar May 12 '24 13:05 eendebakpt

Nice effort but this doesn't make sense. A 1/2 is already peephole optimized to 0.5.

Also, the variables in the case statement ARE the precomputation. They are closure variables so that the actual function call uses those precomputed values.

Disassembly of the kernel function:

>>> from statistics import kde
>>> from dis import dis
>>> f_hat = kde([0], h=1, kernel='normal')
>>> dis(f_hat.__closure__[0].cell_contents)
  --           COPY_FREE_VARS           1

 923           RESUME                   0
               LOAD_GLOBAL              1 (exp + NULL)
               LOAD_CONST               1 (-0.5)
               LOAD_FAST                0 (t)
               BINARY_OP                5 (*)
               LOAD_FAST                0 (t)
               BINARY_OP                5 (*)
               CALL                     1
               LOAD_DEREF               1 (sqrt2pi)
               BINARY_OP               11 (/)
               RETURN_VALUE

rhettinger avatar May 14 '24 17:05 rhettinger

If you're interested in working on a significant speed-up, I could use some help with the kernel inv_cdf approximation functions in kde_random. If the approximation functions are made more accurate near the end points, there will be significantly fewer iterations in the newton-raphson code.

Both of these could substantially benefit from an afternoon of piecewise curve-fitting or some rational approximation:

def _quartic_invcdf_estimate(p):
    sign, p = (1.0, p) if p <= 1/2 else (-1.0, 1.0 - p)
    x = (2.0 * p) ** 0.4258865685331 - 1.0
    if p >= 0.004 < 0.499:
        x += 0.026818732 * sin(7.101753784 * p + 2.73230839482953)
    return x * sign

def _triweight_invcdf_estimate(p):
    sign, p = (1.0, p) if p <= 1/2 else (-1.0, 1.0 - p)
    x = (2.0 * p) ** 0.3400218741872791 - 1.0
    return x * sign

rhettinger avatar May 14 '24 18:05 rhettinger