tableone icon indicating copy to clipboard operation
tableone copied to clipboard

Add the statistics of hypothesis testing

Open 260147169 opened this issue 1 year ago • 2 comments

Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.

260147169 avatar Oct 07 '22 16:10 260147169

Thanks for the idea and the PR. A couple suggestions:

using the README.md example:


import pandas as pd
data=load_dataset('pn2012')
columns = ['Age', 'SysABP', 'Height', 'Weight', 'ICU', 'death']
categorical = ['ICU', 'death']
groupby = ['death']
nonnormal = ['Age']
labels={'death': 'mortality'}
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=True,test_stat=True))

Works, but the table could use a little cleanup:

                         Grouped by mortality
                                      Missing           Overall                 0                 1 Test-stat P-Value
n                                                          1000               864               136
Age, median [Q1,Q3]                         0  68.0 [53.0,79.0]  66.0 [52.8,78.0]  75.0 [62.0,83.0]    23.882  <0.001
SysABP, mean (SD)                         291      114.3 (40.2)      115.4 (38.3)      107.6 (49.4)     1.510   0.134
Height, mean (SD)                         475      170.1 (22.1)      170.3 (23.2)      168.5 (11.3)     1.030   0.304
Weight, mean (SD)                         302       82.9 (23.8)       83.0 (23.6)       82.3 (25.4)     0.277   0.782
ICU, n (%)          CCU                     0        162 (16.2)        137 (15.9)         25 (18.4)    20.093  <0.001
                    CSRU                             202 (20.2)        194 (22.5)           8 (5.9)    20.093
                    MICU                             380 (38.0)        318 (36.8)         62 (45.6)    20.093
                    SICU                             256 (25.6)        215 (24.9)         41 (30.1)    20.093
mortality, n (%)    0                       0        864 (86.4)       864 (100.0)                     991.508  <0.001
                    1                                136 (13.6)                         136 (100.0)   991.508

There is some redundancy wrt to the Test-stat column. There should only be one test-stat, as p-value is done.

Changing pval to False breaks it:

 mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 424, in __init__
    self.cat_table = self._create_cat_table(data, overall)
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 1348, in _create_cat_table
    table = table.join(self._htest_table[['Test-stat']])
AttributeError: 'TableOne' object has no attribute '_htest_table'

I also don't think the Fisher test is handled appropriately. There really isn't a test stat for it, so it should be blank, but I believe it reports the Chisq's test statistic and the Fisher p-value:

td = pd.DataFrame({'a':[0,0,0,1]*10 + [1],'b':[1,1,1,1]*10 + [0]})
TableOne(td,columns=['a','b'],categorical=['a','b'],pval=True,groupby="b",test_stat=True)
           Grouped by b
                Missing    Overall          0           1 Test-stat P-Value
n                               41          1          40
a, n (%) 1            0  11 (26.8)  1 (100.0)   10 (25.0)     0.280   0.268
         0               30 (73.2)              30 (75.0)     0.280
b, n (%) 0            0    1 (2.4)  1 (100.0)                 9.744   0.024
         1               40 (97.6)             40 (100.0)     9.744

I think t-test, ANOVA, MW, and KW all have test-stats. @tompollard are there any other tests we should worry about. I don't think the mode test is reported like this, so it should be safe.

jraffa avatar Oct 11 '22 18:10 jraffa

Thanks so mush for collaborator's @jraffa and owner's @tompollard help and suggestion. The update contains the following:

1.After cleaning up redundancy. There will be one test-stat, as p-value. The code is same as above. The results are as follow:

Missing Overall 0 1 Test-stat P-Value
n 1000 864 136
Age, median [Q1,Q3] 0 68.0 [53.0,79.0] 66.0 [52.8,78.0] 75.0 [62.0,83.0] 23.882 <0.001
SysABP, mean (SD) 291 114.3 (40.2) 115.4 (38.3) 107.6 (49.4) 1.510 0.134
Height, mean (SD) 475 170.1 (22.1) 170.3 (23.2) 168.5 (11.3) 1.030 0.304
Weight, mean (SD) 302 82.9 (23.8) 83.0 (23.6) 82.3 (25.4) 0.277 0.782
ICU, n (%) CCU 0 162 (16.2) 137 (15.9) 25 (18.4) 20.093 <0.001
CSRU 202 (20.2) 194 (22.5) 8 (5.9)
MICU 380 (38.0) 318 (36.8) 62 (45.6)
SICU 256 (25.6) 215 (24.9) 41 (30.1)
mortality, n (%) 0 0 864 (86.4) 864 (100.0) 991.508 <0.001
1 136 (13.6) 136 (100.0)

2.When pval=False, it will not break.

mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)

Missing Overall 0 1 Test-stat
n 1000 864 136
Age, median [Q1,Q3] 0 68.0 [53.0,79.0] 66.0 [52.8,78.0] 75.0 [62.0,83.0] 23.882
SysABP, mean (SD) 291 114.3 (40.2) 115.4 (38.3) 107.6 (49.4) 1.510
Height, mean (SD) 475 170.1 (22.1) 170.3 (23.2) 168.5 (11.3) 1.030
Weight, mean (SD) 302 82.9 (23.8) 83.0 (23.6) 82.3 (25.4) 0.277
ICU, n (%) CCU 0 162 (16.2) 137 (15.9) 25 (18.4) 20.093
CSRU 202 (20.2) 194 (22.5) 8 (5.9)
MICU 380 (38.0) 318 (36.8) 62 (45.6)
SICU 256 (25.6) 215 (24.9) 41 (30.1)
mortality, n (%) 0 0 864 (86.4) 864 (100.0) 991.508
1 136 (13.6) 136 (100.0)

3.Fisher's test doesn't calculate statistics. The test_stat of Fisher's test is set to None. And the warning message will prompt the users.

Missing Overall 0 1 Test-stat
n 41 1 40
a, n (%) 1 0 11 (26.8) 1 (100.0) 10 (25.0) nan
0 30 (73.2) 30 (75.0)
b, n (%) 0 0 1 (2.4) 1 (100.0) nan
1 40 (97.6) 40 (100.0)

[1] Fisher's test did not caompute statistics of hypothesis testing. The following variables are affected: a, b.

260147169 avatar Oct 22 '22 12:10 260147169