tableone
tableone copied to clipboard
improve output format
The current approach relies on pandas to output the table to tex, csv, etc. Add custom approach to improve the output quality.
Consider tabulate:
from tableone import TableOne
import pandas as pd
import matplotlib.pyplot as plt
import tabulate
url="https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"
data=pd.read_csv(url)
overall_table = TableOne(data, label_suffix=True)
x = overall_table.tableone.reset_index()
t = tabulate.tabulate(x, tablefmt="grid", headers=['isnull', 'overall'],showindex=False)
print(t)
+-------------------+------+----------+--------------+
| | | isnull | overall |
+===================+======+==========+==============+
| n | | | 1000 |
+-------------------+------+----------+--------------+
| Age, mean (SD) | | 0 | 65.0 (17.2) |
+-------------------+------+----------+--------------+
| SysABP, mean (SD) | | 291 | 114.3 (40.2) |
+-------------------+------+----------+--------------+
| Height, mean (SD) | | 475 | 170.1 (22.1) |
+-------------------+------+----------+--------------+
| Weight, mean (SD) | | 302 | 82.9 (23.8) |
+-------------------+------+----------+--------------+
| ICU, n (%) | CCU | 0 | 162 (16.2) |
+-------------------+------+----------+--------------+
| ICU, n (%) | CSRU | | 202 (20.2) |
+-------------------+------+----------+--------------+
| ICU, n (%) | MICU | | 380 (38.0) |
+-------------------+------+----------+--------------+
| ICU, n (%) | SICU | | 256 (25.6) |
+-------------------+------+----------+--------------+
| MechVent, n (%) | 0 | 0 | 540 (54.0) |
+-------------------+------+----------+--------------+
| MechVent, n (%) | 1 | | 460 (46.0) |
+-------------------+------+----------+--------------+
| LOS, mean (SD) | | 0 | 14.2 (14.2) |
+-------------------+------+----------+--------------+
| death, n (%) | 0 | 0 | 864 (86.4) |
+-------------------+------+----------+--------------+
| death, n (%) | 1 | | 136 (13.6) |
+-------------------+------+----------+--------------+
isdupe = x.duplicated(subset='variable')
x['variable'] = x['variable'].where(~isdupe, '')
t = tabulate.tabulate(x, tablefmt="grid", headers=['isnull', 'overall'],showindex=False)
print(t)
+-------------------+------+----------+--------------+
| | | isnull | overall |
+===================+======+==========+==============+
| n | | | 1000 |
+-------------------+------+----------+--------------+
| Age, mean (SD) | | 0 | 65.0 (17.2) |
+-------------------+------+----------+--------------+
| SysABP, mean (SD) | | 291 | 114.3 (40.2) |
+-------------------+------+----------+--------------+
| Height, mean (SD) | | 475 | 170.1 (22.1) |
+-------------------+------+----------+--------------+
| Weight, mean (SD) | | 302 | 82.9 (23.8) |
+-------------------+------+----------+--------------+
| ICU, n (%) | CCU | 0 | 162 (16.2) |
+-------------------+------+----------+--------------+
| | CSRU | | 202 (20.2) |
+-------------------+------+----------+--------------+
| | MICU | | 380 (38.0) |
+-------------------+------+----------+--------------+
| | SICU | | 256 (25.6) |
+-------------------+------+----------+--------------+
| MechVent, n (%) | 0 | 0 | 540 (54.0) |
+-------------------+------+----------+--------------+
| | 1 | | 460 (46.0) |
+-------------------+------+----------+--------------+
| LOS, mean (SD) | | 0 | 14.2 (14.2) |
+-------------------+------+----------+--------------+
| death, n (%) | 0 | 0 | 864 (86.4) |
+-------------------+------+----------+--------------+
| | 1 | | 136 (13.6) |
+-------------------+------+----------+--------------+
The tabulate
method was added in 0.6.4:
# import libraries
from tableone import TableOne
import pandas as pd
# load sample data into a pandas dataframe
url="https://raw.githubusercontent.com/tompollard/tableone/master/data/pn2012_demo.csv"
data=pd.read_csv(url)
table = TableOne(data, label_suffix=True)
print(overall_table.tabulate(tablefmt = "fancygrid"))
outputs:
╒═══════════════════╤══════╤═══════════╤══════════════╕
│ │ │ Missing │ Overall │
╞═══════════════════╪══════╪═══════════╪══════════════╡
│ n │ │ │ 1000 │
├───────────────────┼──────┼───────────┼──────────────┤
│ Age, mean (SD) │ │ 0 │ 65.0 (17.2) │
├───────────────────┼──────┼───────────┼──────────────┤
│ SysABP, mean (SD) │ │ 291 │ 114.3 (40.2) │
├───────────────────┼──────┼───────────┼──────────────┤
│ Height, mean (SD) │ │ 475 │ 170.1 (22.1) │
├───────────────────┼──────┼───────────┼──────────────┤
│ Weight, mean (SD) │ │ 302 │ 82.9 (23.8) │
├───────────────────┼──────┼───────────┼──────────────┤
│ ICU, n (%) │ CCU │ 0 │ 162 (16.2) │
├───────────────────┼──────┼───────────┼──────────────┤
│ │ CSRU │ │ 202 (20.2) │
├───────────────────┼──────┼───────────┼──────────────┤
│ │ MICU │ │ 380 (38.0) │
├───────────────────┼──────┼───────────┼──────────────┤
│ │ SICU │ │ 256 (25.6) │
├───────────────────┼──────┼───────────┼──────────────┤
│ MechVent, n (%) │ 0 │ 0 │ 540 (54.0) │
├───────────────────┼──────┼───────────┼──────────────┤
│ │ 1 │ │ 460 (46.0) │
├───────────────────┼──────┼───────────┼──────────────┤
│ LOS, mean (SD) │ │ 0 │ 14.2 (14.2) │
├───────────────────┼──────┼───────────┼──────────────┤
│ death, n (%) │ 0 │ 0 │ 864 (86.4) │
├───────────────────┼──────┼───────────┼──────────────┤
│ │ 1 │ │ 136 (13.6) │
╘═══════════════════╧══════╧═══════════╧══════════════╛
It would be good to left align the index columns in the dataframe. See discussion on how to achieve this with styler at: https://github.com/pandas-dev/pandas/issues/39602
Currently the columns are centered when rendered in a notebook, which looks awkward: