math-php icon indicating copy to clipboard operation
math-php copied to clipboard

Studentized Range Distribution CDF

Open markrogoyski opened this issue 8 years ago • 29 comments

In order to add the Tukey's Range Test of statistical significance, it seems that we need the Studentized Range Distribution CDF.

Tukey's Range Test Studentized Range Studentized Range Distribution

I have not been able to find a lot of details on how to actually compute the CDF. I've seem some mentions of old Fortan algorithms and some complex approximations, but I haven't seen something that I would consider the definitive method to calculate this.

Is anyone familiar with how to compute the CDF of this distribution, or could point out some reference that has a method that is considered 'the right way' to do it?

Thanks.

markrogoyski avatar Sep 16 '16 21:09 markrogoyski

I've been referencing that "numerical Recipes" book, boost, and this: http://www.stat.rice.edu/~dobelman/textfiles/DistributionsHandbook.pdf

There's nothing in any of those sources. Is this helpful?

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Tukey.html

Beakerboy avatar Sep 17 '16 12:09 Beakerboy

Fortran code: http://lib.stat.cmu.edu/apstat/190

Beakerboy avatar Sep 19 '16 17:09 Beakerboy

Yeah, I ran into that Fortran code as well. I supposed as a first attempt I could try to implement that in PHP and see if the output resembles the distribution. PHP does have the goto operator after all =) http://php.net/manual/en/control-structures.goto.php

markrogoyski avatar Sep 19 '16 18:09 markrogoyski

I think this is it: https://projecteuclid.org/download/pdf_1/euclid.aoms/1177705684

Beakerboy avatar Sep 19 '16 19:09 Beakerboy

Cool. That's a good reference since it has all the tables precomputed.

I just skimmed through it, but this line stood out: The method of calculation of the probability integral ... will not be included here.

Hopefully there is enough information to figure this out. It's strange that Tukey's Range Test is fairly common, yet the distribution used to compute it has very little information about it online.

markrogoyski avatar Sep 19 '16 19:09 markrogoyski

It does have the equation to calculate the moments. Can you use that to figure out the PDF?

Beakerboy avatar Sep 19 '16 19:09 Beakerboy

https://www.jstor.org/stable/2332134?seq=1#page_scan_tab_contents

Beakerboy avatar Sep 19 '16 19:09 Beakerboy

In that document's page, the p(x) looks like the Normal distribution PDF function. Then it states No simple expressions exists for the probability law fn(w) of w...

I'm not sure how to use this information to code something up. All the other distributions are on Wikipedia with a nice formulas =)

markrogoyski avatar Sep 19 '16 19:09 markrogoyski

Did you look at page 309, after the first set of tables? I think that explains it as a double integral? when Anna equals to it somehow reduces the standard normal distribution.

Beakerboy avatar Sep 19 '16 21:09 Beakerboy

Ahh, I didn't realize there were more pages. OK. So if I register I can view the entire article. Thanks for pointing that out.

markrogoyski avatar Sep 19 '16 21:09 markrogoyski

Yes, register and it's free, although the quality is pretty poor. I'm having a hard time with some of the superscripted.

Beakerboy avatar Sep 19 '16 22:09 Beakerboy

Is this helpful?https://en.wikipedia.org/wiki/Range_(statistics)#Distribution

Beakerboy avatar Sep 20 '16 00:09 Beakerboy

I think I figured this out. I used the formula in the above "Range" Wikipedia article, where the distribution was the standard normal distribution. The critical values seem to agree with a Tukey table with infinite degrees of freedom.

https://docs.google.com/a/uwalumni.com/spreadsheets/d/13M2Z4F6tTE0VVVLvGdynwBdBekpFLoBE5KTTCs3JcrI/edit?usp=sharing

Edit: However, replacing it with a t distribution does not seem to agree with non-infinite df values.

Beakerboy avatar Sep 20 '16 13:09 Beakerboy

Here's a question I posted on stack exchange: http://stats.stackexchange.com/questions/235785/calculate-the-critical-value-of-tukey-q/235979#235979

Beakerboy avatar Sep 20 '16 15:09 Beakerboy

Thanks for continuing to look into this. Hopefully someone other than you answers your Stack Exchange question.

With so little information available, I wonder if the online ANOVA calculators that do the Tukey's Range test are just using pre-computed tables.

markrogoyski avatar Sep 20 '16 17:09 markrogoyski

I think I found the R implementation for these functions:

ptukey: https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/nmath/ptukey.c

qtukey: https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/nmath/qtukey.c

markrogoyski avatar Sep 20 '16 17:09 markrogoyski

When you figure out what they do on a theoretical level, I'd love to know.

Beakerboy avatar Sep 20 '16 21:09 Beakerboy

Thinking this over...the studentized range is supposed to include the standard deviation of the samples. When the number of samples (df) approaches infinity, this estimate of s will approach one. I'm assuming there's a missing factor to correct for the estimation of s from the samples somewhere: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

Beakerboy avatar Sep 21 '16 13:09 Beakerboy

I think I figured it out, it's a two tailed test. I don't know how to modify my integral to account for that though. If you compare a chart of critical t values for a two tailed test at .05, and multiply by sqrt(2), it will match a tukey chart of critical q with k=2 and the same df.

Beakerboy avatar Sep 21 '16 14:09 Beakerboy

Once we have it all figured out, someone should write a blog post or update the Wikipedia page. It would end up being the definitive online source for this distribution.

markrogoyski avatar Sep 22 '16 04:09 markrogoyski

I started an article on Wikipedia: https://en.wikipedia.org/wiki/Studentized_range_distribution

Beakerboy avatar Sep 26 '16 16:09 Beakerboy

If you click "Next item" a few times in the Biometrika article above, to page 334, there's another technical article that is probably helpful.

The Range in Random Samples H. O. Hartley Biometrika Vol. 32, No. 3/4 (Apr., 1942), pp. 334-348

Beakerboy avatar Sep 26 '16 17:09 Beakerboy

...And I think I finally found the generalized equation. It's in the wiki article. I'm trying to verify this PDF by numerically integrating it to the CDF using a t distribution for f(q).

Beakerboy avatar Sep 26 '16 19:09 Beakerboy

Wow. Great work on the wiki article. Thanks for doing this.

markrogoyski avatar Sep 26 '16 20:09 markrogoyski

Here's the literature source for the fortran code from above. I think this fills the blanks in some on what it is actually numerically integrating. http://www.jstor.org/stable/2347300?seq=1#page_scan_tab_contents

Beakerboy avatar Oct 11 '16 17:10 Beakerboy

Here's a paper with the same equation in a different form. I'm still trying to figure out how this formula arises. i think I have an intuitive sense on the inner integral, but I have to figure out why the out one estimates the standard deviation. Related to the Chi-Squared distribution somehow? http://link.springer.com/article/10.3758/BF03202264

Beakerboy avatar Oct 12 '16 18:10 Beakerboy

Cool. Thanks for finding and sharing the Fortran code.

markrogoyski avatar Oct 13 '16 00:10 markrogoyski

I have something in the works on this if you would like to putz with it: https://github.com/Beakerboy/math-php/blob/StudentizedRange/src/Probability/Distribution/Continuous/StudentizedRange.php

Beakerboy avatar May 16 '17 15:05 Beakerboy

Thanks for continuing to work on this!

markrogoyski avatar May 17 '17 04:05 markrogoyski