KDEpy icon indicating copy to clipboard operation
KDEpy copied to clipboard

Add a new rule of thumb

Open Expertium opened this issue 1 year ago • 2 comments

There is a rule of thumb which should, in theory, perform better than Silverman's rule. Here is the relevant paper: https://www.hindawi.com/journals/jps/2015/242683/ And here's my simple Python implementation for one-dimensional data:

def chens_rule(data):
    std = np.std(data)
    IQR = (np.percentile(data, q=75) - np.percentile(data, q=25)) / 1.3489795003921634
    scale = min(IQR, std)
    mean = np.mean(data)
    n = len(data)
    if mean != 0 and scale > 0:
        cv = (1 + 1 / (4 * n)) * scale / mean  # corrected for small sample size
        h = ((4 * (2 + cv ** 2)) ** (1 / 5)) * scale * (n ** (-2 / 5))
        return h
    else:
        raise Exception("Chen's rule failed")

Note that I added two changes compared to the original paper:

  1. The estimate of scale is not exactly the same as the standard deviation: I changed it to make it more robust, similar to the Silverman's rule
  2. I added a sample size correction to the coefficient of variation. However, it's only appropriate for normally distributed data, so I'm not entirely sure whether it should be used

Expertium avatar Jan 15 '24 15:01 Expertium