simple-statistics icon indicating copy to clipboard operation
simple-statistics copied to clipboard

quantile_rank

Open jonlachlan opened this issue 10 years ago • 15 comments

I'm looking for a way to calculate the quantile_rank of a specific data point. For example, I may want to know that a data point is in the 80th percentile, or in the 3rd quartile. I think I'd prefer a precise value, for example "this value is at the 3.245 quartile", which I can round to 3 if I want.

I've done this in Postgres using the ntile() window function (http://www.postgresql.org/docs/9.4/static/functions-window.html), but I'd like to do it in Javascript.

jonlachlan avatar Aug 05 '15 20:08 jonlachlan

I suppose if I have a sorted array, I know the rank of each value, then take that as a percentage of the array length, then multiply by the number of quantiles I'm are looking for.

So if 75 is the 100th element in an array of 200, then this is the 100/200 or .5. Multiply by 100 to get the percentile rank (50) or multiply by 4 for the quartile rank (2).

I think I'm good do this on my own, but thanks anyways :)

jonlachlan avatar Aug 05 '15 20:08 jonlachlan

so the way to do this would probably be something like

var myNumber = ...;
var myArray = [...];
var qRank = ss.bisect(myArray.sort(), myNumber) / myArray.length;

tmcw avatar Aug 05 '15 21:08 tmcw

Hmm I'm not seeing a bisect function, is that in a new ss dist?

jonlachlan avatar Aug 05 '15 22:08 jonlachlan

There currently isn't one - I'm proposing we could include one

tmcw avatar Aug 06 '15 14:08 tmcw

Has anyone made any progress on this? I have an unsorted array of values as [x, y, z, w] and I'm looking to know what percentile z is in, is that a separate issue or is this the one? I noticed that stats-lite has such a function, but since I already use simple-statistics, I'm not too keen on switching now.

denizdogan avatar Mar 20 '18 13:03 denizdogan

@denizdogan it seems the bisect function is now part of the current release on npm. Have you tried @tmcw's solution?

Yomguithereal avatar Mar 20 '18 13:03 Yomguithereal

What's more, I don't see a function such as the one you need in the stats-lite module.

Yomguithereal avatar Mar 20 '18 13:03 Yomguithereal

@tmcw on a side note, should we implement a quantile_rank function based on scipy? https://github.com/scipy/scipy/blob/v1.0.0/scipy/stats/stats.py#L1709-L1802

Yomguithereal avatar Mar 20 '18 13:03 Yomguithereal

@Yomguithereal I spoke too quickly, the function in stats-lite is slightly different than the one I proposed. Anyways, I honestly don't understand how I'm supposed to use the bisect function, its signature is different than the one in the comment above. :/

denizdogan avatar Mar 20 '18 17:03 denizdogan

@denizdogan yes it seems the signature changed to take a function rather than an array. So instead, you should probably use a basic binary search (I assume the value you want the percentile from is in your array, else it would be slightly more complex:

function binarySearch(array, value) {
  var mid = 0;
  var lo = 0;
  var hi = array.length;

  hi--;

  var current;

  while (lo <= hi) {
    mid = (lo + hi) >>> 1;

    current = array[mid];

    if (current > value) {
      hi = ~-mid;
    }
    else if (current < value) {
      lo = -~mid;
    }
    else {
      return mid;
    }
  }

  return -1;
}

Then:

var qRank = binarySearch(myArray.sort(), myNumber) / myArray.length;

Yomguithereal avatar Mar 20 '18 18:03 Yomguithereal

To get the percentile, multiply by 100 obviously :)

Yomguithereal avatar Mar 20 '18 18:03 Yomguithereal

Or, even simpler if you don't care about performance:

var qRank = myArray.sort().indexOf(myNumber) / myArray.length;

Yomguithereal avatar Mar 20 '18 18:03 Yomguithereal

@Yomguithereal Thanks a lot, much appreciated! :)

denizdogan avatar Mar 20 '18 18:03 denizdogan

Note however that if you have the found value multiple times, this formula is not completely correct and there are many ways to answer the question. For instance, default scipy would get the mean of the indices where your value is found in the sorted array.

Yomguithereal avatar Mar 20 '18 18:03 Yomguithereal

@denizdogan @tmcw I drafted a PR to add the quantileRankSorted method to the library. @denizdogan you should probably base your work on it since it should be more precise than things I've told you earlier.

Yomguithereal avatar Mar 20 '18 20:03 Yomguithereal