scikit-posthocs
scikit-posthocs copied to clipboard
Question: Post-hoc dunn return non-significant
Hi! Thanks a lot for creating this analysis tool.
I would like to check if it is normal that a post-hoc analysis using Dunn test, after Kruskal-Wallis, returns no significant result at all between the pairwise comparisons?
Another question, does Dunn test require multiple comparison correction? Either way (with or without correction), I don't get any significant even though Kruskal-Wallis test rejects the null hypothesis.
Hey! I think this may be due to insufficient statistical power of Dunn's test. You can try Conover's test instead. I will look for the relevant research on this. And yes, Dunn's and Conover's tests require p values correction.
@maximtrp Thanks for your reply! I just tried Conover's test and it is still the same. In fact, the corrected pairwise p-values are actually higher with Conover's test.
I followed this with Bonferroni-Holm correction (p_adjust='holm'
)
Hi,
I met the same situation recently. Below is the example from [1] regarding four algorithms over 14 datasets.
# %%
import numpy as np
data = [
[0.763, 0.768, 0.771, 0.798],
[0.599, 0.591, 0.590, 0.569],
[0.954, 0.971, 0.968, 0.967],
[0.628, 0.661, 0.654, 0.657],
[0.882, 0.888, 0.886, 0.898],
[0.936, 0.931, 0.916, 0.931],
[0.661, 0.668, 0.609, 0.685],
[0.583, 0.583, 0.563, 0.625],
[0.775, 0.838, 0.866, 0.875],
[1.000, 1.000, 1.000, 1.000],
[0.940, 0.962, 0.965, 0.962],
[0.619, 0.666, 0.614, 0.669],
[0.972, 0.981, 0.975, 0.975],
[0.957, 0.978, 0.946, 0.970]
]
data = np.array(data)
# %%
import scikit_posthocs as sp
sp.posthoc_dunn(data.T, p_adjust='bonferroni')
But it returns meaningless results:
1 | 2 | 3 | 4 | |
---|---|---|---|---|
1.0 | 1.0 | 1.0 | 1.0 | |
1.0 | 1.0 | 1.0 | 1.0 | |
1.0 | 1.0 | 1.0 | 1.0 | |
1.0 | 1.0 | 1.0 | 1.0 |
I observed that the implementation of posthoc_dunn ranks the entire data matrix, while [1] is row-wise. Did this make any difference ?
Thanks a lot !
[1] Dem\check{s}ar, J. Statistical comparisons of classifiers over multiple data sets. JMLR, 2006.
I have checked the algorithm and found no errors. Dunn suggests ranking all data in the original paper.