PyHealth
PyHealth copied to clipboard
[CS598] New dataset loader class for SBDHs from MIMIC-3
Authors
Jalen Jiang - jalenj4 Rodigo Mata - mata6 @rodrigomata9
What
- New dataset loader for joining MIMIC-3 discharge summaries to the SBDH labels provided by MIMIC-SBDH: A Dataset for Social and Behavioral Determinants of Health
- Joins
NOTEEVENTS.csvtoMIMIC-SBDH.csvonROW_ID, also include attributes to relate to patient id and charttime. Importantly,TEXTis truncated down to only the Social History portion, and is joined to the sbdh labels like community-present, community-absent, education, economics, environment, alcohol, tobacco and drug.
Usage
from pyhealth.datasets import SBDHDataset
data_dir = "/path/to/data" # path containing NOTEEVENTS.csv and MIMIC-SBDH.csv
output_dir = "/path/to/output" #assuming this exsts
# Initialize the dataset
dataset = SBDHDataset(
root=data_dir, # this might take a while, since social history extract happens upon instantiation of class
)
# display stats, should see 7025 rows
dataset.stats()
# export extracted social history to CSV to proceed with model training on classifying text to SBDH
social_history_path = os.path.join(output_dir, "social_history.csv")
dataset.export_social_history(social_history_path)