node-faststats icon indicating copy to clipboard operation
node-faststats copied to clipboard

Percentiles function should use interpolation

Open OAGr opened this issue 9 years ago • 2 comments

First, fantastic library.

Second, it seems to me like node-faststats calculates percentiles without interpolating. For instance, if my data points are 1,3,6, the 5th percentile would be 1, instead of 1.1, which would be the interpolated value between 1 and 3.

Does this seem like a reasonable change? I imagine it could also be an optional parameter, though that could make things more complicated.

The reason I bring this up is Excel uses interpolation for its percentiles function. http://www.excelfunctions.net/Excel-Percentile-Function.html

OAGr avatar Mar 18 '16 22:03 OAGr

I am ok with it being an optional parameter, but by definition, percentiles should not use interpolation. A percentile is required to be a point within the dataset. The only exception is the median which for an even number of points is the arithmetic mean of the two middle points.

What might be good is to do what Redshift does with PERCENT_DISC and PERCENT_CONT functions. The former is guaranteed to return a point from the set while the latter uses interpolation. This works well with data that can either be continuous or discrete. This can be one function with an optional argument for what type we need.

Feel like sending in a patch?

bluesmoon avatar Mar 19 '16 03:03 bluesmoon

That was my impression too about percentiles, I was surprised it was that way in Excel.

I'll see if I have time in the next few weeks to work on it.

Feel free to close this issue if you like.

OAGr avatar Mar 19 '16 03:03 OAGr