simple-statistics
simple-statistics copied to clipboard
quantile_rank
I'm looking for a way to calculate the quantile_rank of a specific data point. For example, I may want to know that a data point is in the 80th percentile, or in the 3rd quartile. I think I'd prefer a precise value, for example "this value is at the 3.245 quartile", which I can round to 3 if I want.
I've done this in Postgres using the ntile() window function (http://www.postgresql.org/docs/9.4/static/functions-window.html), but I'd like to do it in Javascript.
I suppose if I have a sorted array, I know the rank of each value, then take that as a percentage of the array length, then multiply by the number of quantiles I'm are looking for.
So if 75 is the 100th element in an array of 200, then this is the 100/200 or .5. Multiply by 100 to get the percentile rank (50) or multiply by 4 for the quartile rank (2).
I think I'm good do this on my own, but thanks anyways :)
so the way to do this would probably be something like
var myNumber = ...;
var myArray = [...];
var qRank = ss.bisect(myArray.sort(), myNumber) / myArray.length;
Hmm I'm not seeing a bisect function, is that in a new ss dist?
There currently isn't one - I'm proposing we could include one
Has anyone made any progress on this? I have an unsorted array of values as [x, y, z, w] and I'm looking to know what percentile z is in, is that a separate issue or is this the one? I noticed that stats-lite has such a function, but since I already use simple-statistics, I'm not too keen on switching now.
@denizdogan it seems the bisect function is now part of the current release on npm. Have you tried @tmcw's solution?
What's more, I don't see a function such as the one you need in the stats-lite module.
@tmcw on a side note, should we implement a quantile_rank function based on scipy? https://github.com/scipy/scipy/blob/v1.0.0/scipy/stats/stats.py#L1709-L1802
@Yomguithereal I spoke too quickly, the function in stats-lite is slightly different than the one I proposed. Anyways, I honestly don't understand how I'm supposed to use the bisect function, its signature is different than the one in the comment above. :/
@denizdogan yes it seems the signature changed to take a function rather than an array. So instead, you should probably use a basic binary search (I assume the value you want the percentile from is in your array, else it would be slightly more complex:
function binarySearch(array, value) {
var mid = 0;
var lo = 0;
var hi = array.length;
hi--;
var current;
while (lo <= hi) {
mid = (lo + hi) >>> 1;
current = array[mid];
if (current > value) {
hi = ~-mid;
}
else if (current < value) {
lo = -~mid;
}
else {
return mid;
}
}
return -1;
}
Then:
var qRank = binarySearch(myArray.sort(), myNumber) / myArray.length;
To get the percentile, multiply by 100 obviously :)
Or, even simpler if you don't care about performance:
var qRank = myArray.sort().indexOf(myNumber) / myArray.length;
@Yomguithereal Thanks a lot, much appreciated! :)
Note however that if you have the found value multiple times, this formula is not completely correct and there are many ways to answer the question. For instance, default scipy would get the mean of the indices where your value is found in the sorted array.
@denizdogan @tmcw I drafted a PR to add the quantileRankSorted method to the library. @denizdogan you should probably base your work on it since it should be more precise than things I've told you earlier.