scikit-posthocs icon indicating copy to clipboard operation
scikit-posthocs copied to clipboard

posthoc_tukey yields p-values of 0.900 and 0.100 which are also different to scipy.stats.tukey_hsd

Open DavidTadres opened this issue 2 years ago • 1 comments

First, thank you very much for this very useful package.

Describe the bug scikit_posthocs.posthoc_tukey gives unexpected result of '0.001' and '0.900' with one dataset (see below).

scipy.stats.tukey_hsd gives similar but not identical numbers.

Note that the groups don't have the identical n.

Dataset see code below

To Reproduce

# bug report
import scikit_posthocs as sp
import scipy
import pandas as pd

data_a = [0.08331362, 0.22462052, 0.44619224, 0.34004518, 0.03146107,
           0.15828442, 0.27876282, 0.14699693, 0.3870986 , 0.33669976,
           0.38822324, 0.28127964, 0.04101782, 0.31787209, 0.20165472,
           0.40043812, 0.50580976, 0.20009951]

data_b = [ 0.14693014,  0.0055596 ,  0.19977264, -0.30859794,  0.017286  ,
            0.05342739, -0.09502465,  0.01998256,  0.06162499,  0.18634389,
            0.34667326,  0.06702727,  0.14268381,  0.13141426,  0.06344518,
            0.04185783,  0.18701589, -0.06134188,  0.02844774]

data_c = [ 0.14727163,  0.10290732, -0.09934048,  0.06231107, -0.06754609,
            0.04739071,  0.19232889,  0.03198218,  0.11590822,  0.08816257,
            0.05692482,  0.04922897, -0.06524353,  0.08966288,  0.12975986,
           -0.08346692,  0.02827149,  0.15724036,  0.05327535]

all_data = [data_a, data_b, data_c]
labels = ['a', 'b', 'c']

df = pd.DataFrame()
for i in range(3):
    cur_dict = {'Group': [labels[i]] * len(all_data[i]),
                'Data': all_data[i]}
    
    cur_df = pd.DataFrame(cur_dict)
    
    df = pd.concat([cur_df, df],
                                     ignore_index=True)
    df.reset_index()
    
print(sp.posthoc_tukey(df,
                       val_col='Data',
                       group_col='Group'))

This yields:

       c      b      a
c  1.000  0.900  0.001
b  0.900  1.000  0.001
a  0.001  0.001  1.000

Which seem very unlikely values, right?

In contrast, the scipy library yields

print(scipy.stats.tukey_hsd(data_a, data_b, data_c))
Tukey's HSD Pairwise Group Comparisons (95.0% Confidence Interval)
Comparison  Statistic  p-value  Lower CI  Upper CI
 (0 - 1)      0.200     0.000     0.103     0.297
 (0 - 2)      0.210     0.000     0.114     0.307
 (1 - 0)     -0.200     0.000    -0.297    -0.103
 (1 - 2)      0.010     0.963    -0.085     0.106
 (2 - 0)     -0.210     0.000    -0.307    -0.114
 (2 - 1)     -0.010     0.963    -0.106     0.085

Expected behavior I am not sure what the correct result it but it seems unlikely that the resulting p-value is '0.001' for two comparisons.

Also, it's unclear why the tukey test of scikit-posthocs gives a different result compared to the scipy version.

System and package information (please complete the following information):

  • OS: Window 10 Pro
  • Package version:
    • scikit-posthocs 0.7.0 pyhd8ed1ab_0 conda-forge
    • scipy 1.10.1 py310h309d312_1

Additional context Other datasets give different results with more plausible p-values such as


           a           b           c
a       1.000000    0.409612     0.001678
b       0.409612    1.000000     0.053077
c       0.001678     0.053077    1.000000

DavidTadres avatar Jun 21 '23 23:06 DavidTadres

Hi! Thank you for reporting this. You should be using posthoc_tukey_hsd function instead of posthoc_tukey. But I do not like how both of them are implemented.

Scipy's function was added much later than this, now we can reimplement pairwise Tukey test based on it.

maximtrp avatar Jun 22 '23 04:06 maximtrp