tableone Add the statistics of hypothesis testing

Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.

Oct 07 '22 16:10 260147169

Thanks for the idea and the PR. A couple suggestions:

using the README.md example:


import pandas as pd
data=load_dataset('pn2012')
columns = ['Age', 'SysABP', 'Height', 'Weight', 'ICU', 'death']
categorical = ['ICU', 'death']
groupby = ['death']
nonnormal = ['Age']
labels={'death': 'mortality'}
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=True,test_stat=True))

Works, but the table could use a little cleanup:

                         Grouped by mortality
                                      Missing           Overall                 0                 1 Test-stat P-Value
n                                                          1000               864               136
Age, median [Q1,Q3]                         0  68.0 [53.0,79.0]  66.0 [52.8,78.0]  75.0 [62.0,83.0]    23.882  <0.001
SysABP, mean (SD)                         291      114.3 (40.2)      115.4 (38.3)      107.6 (49.4)     1.510   0.134
Height, mean (SD)                         475      170.1 (22.1)      170.3 (23.2)      168.5 (11.3)     1.030   0.304
Weight, mean (SD)                         302       82.9 (23.8)       83.0 (23.6)       82.3 (25.4)     0.277   0.782
ICU, n (%)          CCU                     0        162 (16.2)        137 (15.9)         25 (18.4)    20.093  <0.001
                    CSRU                             202 (20.2)        194 (22.5)           8 (5.9)    20.093
                    MICU                             380 (38.0)        318 (36.8)         62 (45.6)    20.093
                    SICU                             256 (25.6)        215 (24.9)         41 (30.1)    20.093
mortality, n (%)    0                       0        864 (86.4)       864 (100.0)                     991.508  <0.001
                    1                                136 (13.6)                         136 (100.0)   991.508

There is some redundancy wrt to the Test-stat column. There should only be one test-stat, as p-value is done.

Changing pval to False breaks it:

 mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 424, in __init__
    self.cat_table = self._create_cat_table(data, overall)
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 1348, in _create_cat_table
    table = table.join(self._htest_table[['Test-stat']])
AttributeError: 'TableOne' object has no attribute '_htest_table'

I also don't think the Fisher test is handled appropriately. There really isn't a test stat for it, so it should be blank, but I believe it reports the Chisq's test statistic and the Fisher p-value:

td = pd.DataFrame({'a':[0,0,0,1]*10 + [1],'b':[1,1,1,1]*10 + [0]})
TableOne(td,columns=['a','b'],categorical=['a','b'],pval=True,groupby="b",test_stat=True)

           Grouped by b
                Missing    Overall          0           1 Test-stat P-Value
n                               41          1          40
a, n (%) 1            0  11 (26.8)  1 (100.0)   10 (25.0)     0.280   0.268
         0               30 (73.2)              30 (75.0)     0.280
b, n (%) 0            0    1 (2.4)  1 (100.0)                 9.744   0.024
         1               40 (97.6)             40 (100.0)     9.744

I think t-test, ANOVA, MW, and KW all have test-stats. @tompollard are there any other tests we should worry about. I don't think the mode test is reported like this, so it should be safe.

Oct 11 '22 18:10 jraffa

Thanks so mush for collaborator's @jraffa and owner's @tompollard help and suggestion. The update contains the following:

1.After cleaning up redundancy. There will be one test-stat, as p-value. The code is same as above. The results are as follow:

		Missing	Overall	0	1	Test-stat	P-Value
n			1000	864	136
Age, median [Q1,Q3]		0	68.0 [53.0,79.0]	66.0 [52.8,78.0]	75.0 [62.0,83.0]	23.882	<0.001
SysABP, mean (SD)		291	114.3 (40.2)	115.4 (38.3)	107.6 (49.4)	1.510	0.134
Height, mean (SD)		475	170.1 (22.1)	170.3 (23.2)	168.5 (11.3)	1.030	0.304
Weight, mean (SD)		302	82.9 (23.8)	83.0 (23.6)	82.3 (25.4)	0.277	0.782
ICU, n (%)	CCU	0	162 (16.2)	137 (15.9)	25 (18.4)	20.093	<0.001
	CSRU		202 (20.2)	194 (22.5)	8 (5.9)
	MICU		380 (38.0)	318 (36.8)	62 (45.6)
	SICU		256 (25.6)	215 (24.9)	41 (30.1)
mortality, n (%)	0	0	864 (86.4)	864 (100.0)		991.508	<0.001
	1		136 (13.6)		136 (100.0)

2.When pval=False, it will not break.

mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)

		Missing	Overall	0	1	Test-stat
n			1000	864	136
Age, median [Q1,Q3]		0	68.0 [53.0,79.0]	66.0 [52.8,78.0]	75.0 [62.0,83.0]	23.882
SysABP, mean (SD)		291	114.3 (40.2)	115.4 (38.3)	107.6 (49.4)	1.510
Height, mean (SD)		475	170.1 (22.1)	170.3 (23.2)	168.5 (11.3)	1.030
Weight, mean (SD)		302	82.9 (23.8)	83.0 (23.6)	82.3 (25.4)	0.277
ICU, n (%)	CCU	0	162 (16.2)	137 (15.9)	25 (18.4)	20.093
	CSRU		202 (20.2)	194 (22.5)	8 (5.9)
	MICU		380 (38.0)	318 (36.8)	62 (45.6)
	SICU		256 (25.6)	215 (24.9)	41 (30.1)
mortality, n (%)	0	0	864 (86.4)	864 (100.0)		991.508
	1		136 (13.6)		136 (100.0)

3.Fisher's test doesn't calculate statistics. The test_stat of Fisher's test is set to None. And the warning message will prompt the users.

		Missing	Overall	0	1	Test-stat
n			41	1	40
a, n (%)	1	0	11 (26.8)	1 (100.0)	10 (25.0)	nan
	0		30 (73.2)		30 (75.0)
b, n (%)	0	0	1 (2.4)	1 (100.0)		nan
	1		40 (97.6)		40 (100.0)

[1] Fisher's test did not caompute statistics of hypothesis testing. The following variables are affected: a, b.

Oct 22 '22 12:10 260147169

tableone tableone copied to clipboard

Add the statistics of hypothesis testing

tableone
tableone copied to clipboard