KDEpy Add a new rule of thumb

Add a new rule of thumb

Open Expertium opened this issue 1 year ago • 2 comments

There is a rule of thumb which should, in theory, perform better than Silverman's rule. Here is the relevant paper: https://www.hindawi.com/journals/jps/2015/242683/ And here's my simple Python implementation for one-dimensional data:

def chens_rule(data):
    std = np.std(data)
    IQR = (np.percentile(data, q=75) - np.percentile(data, q=25)) / 1.3489795003921634
    scale = min(IQR, std)
    mean = np.mean(data)
    n = len(data)
    if mean != 0 and scale > 0:
        cv = (1 + 1 / (4 * n)) * scale / mean  # corrected for small sample size
        h = ((4 * (2 + cv ** 2)) ** (1 / 5)) * scale * (n ** (-2 / 5))
        return h
    else:
        raise Exception("Chen's rule failed")

Note that I added two changes compared to the original paper:

The estimate of scale is not exactly the same as the standard deviation: I changed it to make it more robust, similar to the Silverman's rule
I added a sample size correction to the coefficient of variation. However, it's only appropriate for normally distributed data, so I'm not entirely sure whether it should be used

Jan 15 '24 15:01 Expertium

KDEpy KDEpy copied to clipboard

Add a new rule of thumb

KDEpy
KDEpy copied to clipboard