tableone
tableone copied to clipboard
Add the statistics of hypothesis testing
Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.
Thanks for the idea and the PR. A couple suggestions:
using the README.md
example:
import pandas as pd
data=load_dataset('pn2012')
columns = ['Age', 'SysABP', 'Height', 'Weight', 'ICU', 'death']
categorical = ['ICU', 'death']
groupby = ['death']
nonnormal = ['Age']
labels={'death': 'mortality'}
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=True,test_stat=True))
Works, but the table could use a little cleanup:
Grouped by mortality
Missing Overall 0 1 Test-stat P-Value
n 1000 864 136
Age, median [Q1,Q3] 0 68.0 [53.0,79.0] 66.0 [52.8,78.0] 75.0 [62.0,83.0] 23.882 <0.001
SysABP, mean (SD) 291 114.3 (40.2) 115.4 (38.3) 107.6 (49.4) 1.510 0.134
Height, mean (SD) 475 170.1 (22.1) 170.3 (23.2) 168.5 (11.3) 1.030 0.304
Weight, mean (SD) 302 82.9 (23.8) 83.0 (23.6) 82.3 (25.4) 0.277 0.782
ICU, n (%) CCU 0 162 (16.2) 137 (15.9) 25 (18.4) 20.093 <0.001
CSRU 202 (20.2) 194 (22.5) 8 (5.9) 20.093
MICU 380 (38.0) 318 (36.8) 62 (45.6) 20.093
SICU 256 (25.6) 215 (24.9) 41 (30.1) 20.093
mortality, n (%) 0 0 864 (86.4) 864 (100.0) 991.508 <0.001
1 136 (13.6) 136 (100.0) 991.508
There is some redundancy wrt to the Test-stat column. There should only be one test-stat, as p-value is done.
Changing pval
to False
breaks it:
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jraffa/temp/tableone/tableone/tableone.py", line 424, in __init__
self.cat_table = self._create_cat_table(data, overall)
File "/home/jraffa/temp/tableone/tableone/tableone.py", line 1348, in _create_cat_table
table = table.join(self._htest_table[['Test-stat']])
AttributeError: 'TableOne' object has no attribute '_htest_table'
I also don't think the Fisher test is handled appropriately. There really isn't a test stat for it, so it should be blank, but I believe it reports the Chisq's test statistic and the Fisher p-value:
td = pd.DataFrame({'a':[0,0,0,1]*10 + [1],'b':[1,1,1,1]*10 + [0]})
TableOne(td,columns=['a','b'],categorical=['a','b'],pval=True,groupby="b",test_stat=True)
Grouped by b
Missing Overall 0 1 Test-stat P-Value
n 41 1 40
a, n (%) 1 0 11 (26.8) 1 (100.0) 10 (25.0) 0.280 0.268
0 30 (73.2) 30 (75.0) 0.280
b, n (%) 0 0 1 (2.4) 1 (100.0) 9.744 0.024
1 40 (97.6) 40 (100.0) 9.744
I think t-test, ANOVA, MW, and KW all have test-stats. @tompollard are there any other tests we should worry about. I don't think the mode test is reported like this, so it should be safe.
Thanks so mush for collaborator's @jraffa and owner's @tompollard help and suggestion. The update contains the following:
1.After cleaning up redundancy. There will be one test-stat, as p-value. The code is same as above. The results are as follow:
Missing | Overall | 0 | 1 | Test-stat | P-Value | ||
---|---|---|---|---|---|---|---|
n | 1000 | 864 | 136 | ||||
Age, median [Q1,Q3] | 0 | 68.0 [53.0,79.0] | 66.0 [52.8,78.0] | 75.0 [62.0,83.0] | 23.882 | <0.001 | |
SysABP, mean (SD) | 291 | 114.3 (40.2) | 115.4 (38.3) | 107.6 (49.4) | 1.510 | 0.134 | |
Height, mean (SD) | 475 | 170.1 (22.1) | 170.3 (23.2) | 168.5 (11.3) | 1.030 | 0.304 | |
Weight, mean (SD) | 302 | 82.9 (23.8) | 83.0 (23.6) | 82.3 (25.4) | 0.277 | 0.782 | |
ICU, n (%) | CCU | 0 | 162 (16.2) | 137 (15.9) | 25 (18.4) | 20.093 | <0.001 |
CSRU | 202 (20.2) | 194 (22.5) | 8 (5.9) | ||||
MICU | 380 (38.0) | 318 (36.8) | 62 (45.6) | ||||
SICU | 256 (25.6) | 215 (24.9) | 41 (30.1) | ||||
mortality, n (%) | 0 | 0 | 864 (86.4) | 864 (100.0) | 991.508 | <0.001 | |
1 | 136 (13.6) | 136 (100.0) |
2.When pval=False, it will not break.
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)
Missing | Overall | 0 | 1 | Test-stat | ||
---|---|---|---|---|---|---|
n | 1000 | 864 | 136 | |||
Age, median [Q1,Q3] | 0 | 68.0 [53.0,79.0] | 66.0 [52.8,78.0] | 75.0 [62.0,83.0] | 23.882 | |
SysABP, mean (SD) | 291 | 114.3 (40.2) | 115.4 (38.3) | 107.6 (49.4) | 1.510 | |
Height, mean (SD) | 475 | 170.1 (22.1) | 170.3 (23.2) | 168.5 (11.3) | 1.030 | |
Weight, mean (SD) | 302 | 82.9 (23.8) | 83.0 (23.6) | 82.3 (25.4) | 0.277 | |
ICU, n (%) | CCU | 0 | 162 (16.2) | 137 (15.9) | 25 (18.4) | 20.093 |
CSRU | 202 (20.2) | 194 (22.5) | 8 (5.9) | |||
MICU | 380 (38.0) | 318 (36.8) | 62 (45.6) | |||
SICU | 256 (25.6) | 215 (24.9) | 41 (30.1) | |||
mortality, n (%) | 0 | 0 | 864 (86.4) | 864 (100.0) | 991.508 | |
1 | 136 (13.6) | 136 (100.0) |
3.Fisher's test doesn't calculate statistics. The test_stat of Fisher's test is set to None. And the warning message will prompt the users.
Missing | Overall | 0 | 1 | Test-stat | ||
---|---|---|---|---|---|---|
n | 41 | 1 | 40 | |||
a, n (%) | 1 | 0 | 11 (26.8) | 1 (100.0) | 10 (25.0) | nan |
0 | 30 (73.2) | 30 (75.0) | ||||
b, n (%) | 0 | 0 | 1 (2.4) | 1 (100.0) | nan | |
1 | 40 (97.6) | 40 (100.0) |
[1] Fisher's test did not caompute statistics of hypothesis testing. The following variables are affected: a, b.