posthoc_tukey yields p-values of 0.900 and 0.100 which are also different to scipy.stats.tukey_hsd
First, thank you very much for this very useful package.
Describe the bug scikit_posthocs.posthoc_tukey gives unexpected result of '0.001' and '0.900' with one dataset (see below).
scipy.stats.tukey_hsd gives similar but not identical numbers.
Note that the groups don't have the identical n.
Dataset see code below
To Reproduce
# bug report
import scikit_posthocs as sp
import scipy
import pandas as pd
data_a = [0.08331362, 0.22462052, 0.44619224, 0.34004518, 0.03146107,
0.15828442, 0.27876282, 0.14699693, 0.3870986 , 0.33669976,
0.38822324, 0.28127964, 0.04101782, 0.31787209, 0.20165472,
0.40043812, 0.50580976, 0.20009951]
data_b = [ 0.14693014, 0.0055596 , 0.19977264, -0.30859794, 0.017286 ,
0.05342739, -0.09502465, 0.01998256, 0.06162499, 0.18634389,
0.34667326, 0.06702727, 0.14268381, 0.13141426, 0.06344518,
0.04185783, 0.18701589, -0.06134188, 0.02844774]
data_c = [ 0.14727163, 0.10290732, -0.09934048, 0.06231107, -0.06754609,
0.04739071, 0.19232889, 0.03198218, 0.11590822, 0.08816257,
0.05692482, 0.04922897, -0.06524353, 0.08966288, 0.12975986,
-0.08346692, 0.02827149, 0.15724036, 0.05327535]
all_data = [data_a, data_b, data_c]
labels = ['a', 'b', 'c']
df = pd.DataFrame()
for i in range(3):
cur_dict = {'Group': [labels[i]] * len(all_data[i]),
'Data': all_data[i]}
cur_df = pd.DataFrame(cur_dict)
df = pd.concat([cur_df, df],
ignore_index=True)
df.reset_index()
print(sp.posthoc_tukey(df,
val_col='Data',
group_col='Group'))
This yields:
c b a
c 1.000 0.900 0.001
b 0.900 1.000 0.001
a 0.001 0.001 1.000
Which seem very unlikely values, right?
In contrast, the scipy library yields
print(scipy.stats.tukey_hsd(data_a, data_b, data_c))
Tukey's HSD Pairwise Group Comparisons (95.0% Confidence Interval)
Comparison Statistic p-value Lower CI Upper CI
(0 - 1) 0.200 0.000 0.103 0.297
(0 - 2) 0.210 0.000 0.114 0.307
(1 - 0) -0.200 0.000 -0.297 -0.103
(1 - 2) 0.010 0.963 -0.085 0.106
(2 - 0) -0.210 0.000 -0.307 -0.114
(2 - 1) -0.010 0.963 -0.106 0.085
Expected behavior I am not sure what the correct result it but it seems unlikely that the resulting p-value is '0.001' for two comparisons.
Also, it's unclear why the tukey test of scikit-posthocs gives a different result compared to the scipy version.
System and package information (please complete the following information):
- OS: Window 10 Pro
- Package version:
- scikit-posthocs 0.7.0 pyhd8ed1ab_0 conda-forge
- scipy 1.10.1 py310h309d312_1
Additional context Other datasets give different results with more plausible p-values such as
a b c
a 1.000000 0.409612 0.001678
b 0.409612 1.000000 0.053077
c 0.001678 0.053077 1.000000
Hi! Thank you for reporting this. You should be using posthoc_tukey_hsd function instead of posthoc_tukey. But I do not like how both of them are implemented.
Scipy's function was added much later than this, now we can reimplement pairwise Tukey test based on it.