math-php
math-php copied to clipboard
Studentized Range Distribution CDF
In order to add the Tukey's Range Test of statistical significance, it seems that we need the Studentized Range Distribution CDF.
Tukey's Range Test Studentized Range Studentized Range Distribution
I have not been able to find a lot of details on how to actually compute the CDF. I've seem some mentions of old Fortan algorithms and some complex approximations, but I haven't seen something that I would consider the definitive method to calculate this.
Is anyone familiar with how to compute the CDF of this distribution, or could point out some reference that has a method that is considered 'the right way' to do it?
Thanks.
I've been referencing that "numerical Recipes" book, boost, and this: http://www.stat.rice.edu/~dobelman/textfiles/DistributionsHandbook.pdf
There's nothing in any of those sources. Is this helpful?
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Tukey.html
Fortran code: http://lib.stat.cmu.edu/apstat/190
Yeah, I ran into that Fortran code as well. I supposed as a first attempt I could try to implement that in PHP and see if the output resembles the distribution. PHP does have the goto operator after all =) http://php.net/manual/en/control-structures.goto.php
I think this is it: https://projecteuclid.org/download/pdf_1/euclid.aoms/1177705684
Cool. That's a good reference since it has all the tables precomputed.
I just skimmed through it, but this line stood out: The method of calculation of the probability integral ... will not be included here.
Hopefully there is enough information to figure this out. It's strange that Tukey's Range Test is fairly common, yet the distribution used to compute it has very little information about it online.
It does have the equation to calculate the moments. Can you use that to figure out the PDF?
https://www.jstor.org/stable/2332134?seq=1#page_scan_tab_contents
In that document's page, the p(x) looks like the Normal distribution PDF function. Then it states No simple expressions exists for the probability law fn(w) of w...
I'm not sure how to use this information to code something up. All the other distributions are on Wikipedia with a nice formulas =)
Did you look at page 309, after the first set of tables? I think that explains it as a double integral? when Anna equals to it somehow reduces the standard normal distribution.
Ahh, I didn't realize there were more pages. OK. So if I register I can view the entire article. Thanks for pointing that out.
Yes, register and it's free, although the quality is pretty poor. I'm having a hard time with some of the superscripted.
Is this helpful?https://en.wikipedia.org/wiki/Range_(statistics)#Distribution
I think I figured this out. I used the formula in the above "Range" Wikipedia article, where the distribution was the standard normal distribution. The critical values seem to agree with a Tukey table with infinite degrees of freedom.
https://docs.google.com/a/uwalumni.com/spreadsheets/d/13M2Z4F6tTE0VVVLvGdynwBdBekpFLoBE5KTTCs3JcrI/edit?usp=sharing
Edit: However, replacing it with a t distribution does not seem to agree with non-infinite df values.
Here's a question I posted on stack exchange: http://stats.stackexchange.com/questions/235785/calculate-the-critical-value-of-tukey-q/235979#235979
Thanks for continuing to look into this. Hopefully someone other than you answers your Stack Exchange question.
With so little information available, I wonder if the online ANOVA calculators that do the Tukey's Range test are just using pre-computed tables.
I think I found the R implementation for these functions:
ptukey: https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/nmath/ptukey.c
qtukey: https://github.com/wch/r-source/blob/e5b21d0397c607883ff25cca379687b86933d730/src/nmath/qtukey.c
When you figure out what they do on a theoretical level, I'd love to know.
Thinking this over...the studentized range is supposed to include the standard deviation of the samples. When the number of samples (df) approaches infinity, this estimate of s will approach one. I'm assuming there's a missing factor to correct for the estimation of s from the samples somewhere: https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation
I think I figured it out, it's a two tailed test. I don't know how to modify my integral to account for that though. If you compare a chart of critical t values for a two tailed test at .05, and multiply by sqrt(2), it will match a tukey chart of critical q with k=2 and the same df.
Once we have it all figured out, someone should write a blog post or update the Wikipedia page. It would end up being the definitive online source for this distribution.
I started an article on Wikipedia: https://en.wikipedia.org/wiki/Studentized_range_distribution
If you click "Next item" a few times in the Biometrika article above, to page 334, there's another technical article that is probably helpful.
The Range in Random Samples H. O. Hartley Biometrika Vol. 32, No. 3/4 (Apr., 1942), pp. 334-348
...And I think I finally found the generalized equation. It's in the wiki article. I'm trying to verify this PDF by numerically integrating it to the CDF using a t distribution for f(q).
Wow. Great work on the wiki article. Thanks for doing this.
Here's the literature source for the fortran code from above. I think this fills the blanks in some on what it is actually numerically integrating. http://www.jstor.org/stable/2347300?seq=1#page_scan_tab_contents
Here's a paper with the same equation in a different form. I'm still trying to figure out how this formula arises. i think I have an intuitive sense on the inner integral, but I have to figure out why the out one estimates the standard deviation. Related to the Chi-Squared distribution somehow? http://link.springer.com/article/10.3758/BF03202264
Cool. Thanks for finding and sharing the Fortran code.
I have something in the works on this if you would like to putz with it: https://github.com/Beakerboy/math-php/blob/StudentizedRange/src/Probability/Distribution/Continuous/StudentizedRange.php
Thanks for continuing to work on this!