KDEpy
KDEpy copied to clipboard
Add a new rule of thumb
There is a rule of thumb which should, in theory, perform better than Silverman's rule. Here is the relevant paper: https://www.hindawi.com/journals/jps/2015/242683/ And here's my simple Python implementation for one-dimensional data:
def chens_rule(data):
std = np.std(data)
IQR = (np.percentile(data, q=75) - np.percentile(data, q=25)) / 1.3489795003921634
scale = min(IQR, std)
mean = np.mean(data)
n = len(data)
if mean != 0 and scale > 0:
cv = (1 + 1 / (4 * n)) * scale / mean # corrected for small sample size
h = ((4 * (2 + cv ** 2)) ** (1 / 5)) * scale * (n ** (-2 / 5))
return h
else:
raise Exception("Chen's rule failed")
Note that I added two changes compared to the original paper:
- The estimate of scale is not exactly the same as the standard deviation: I changed it to make it more robust, similar to the Silverman's rule
- I added a sample size correction to the coefficient of variation. However, it's only appropriate for normally distributed data, so I'm not entirely sure whether it should be used