statsmodels
statsmodels copied to clipboard
ENH: tukeyhsd, pairwise comparison from summary statistics
https://stackoverflow.com/questions/50070927/anova-table-and-pairwise-test
tukey-hsd based on mean and var/std as sufficient statistics.
I have not looked at it, but this should be made possible and easy to add, for one way anova
related #4379 tukey-hsd improvements #4323 discussion of two way anova
is anyone working on it ?
Just in case anyone is in need of a quick hack, this worked for me. Output is a tuple instead of TukeyHSDResults. For some reason putting the vector of variances directly into tukeyhsd()
gave crazy results, only works when var_all
is a scalar.
import numpy as np
def tukey_from_summary(means,stdevs,ns,alpha=.05):
from statsmodels.sandbox.stats.multicomp import tukeyhsd
variances = stdevs**2
pooled_var = sum(np.multiply(variances,(ns-1)))/sum(ns-1)
res = tukeyhsd(mean_all=means, nobs_all=ns, var_all=pooled_var, alpha=alpha, q_crit=None)
return res
putting the vector of variances directly into tukeyhsd() gave crazy results, only works when var_all is a scalar.
sounds like a bug There were some bugs in my initial var correction functions. Maybe they have not all been fixed.
check during refactoring #8396
helper functions like var computation need more unit tests, and possibly removal of code duplication.