PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

[CS598] New dataset loader class for SBDHs from MIMIC-3

Open jalengg opened this issue 8 months ago • 0 comments

Authors

Jalen Jiang - jalenj4 Rodigo Mata - mata6 @rodrigomata9

What

  • New dataset loader for joining MIMIC-3 discharge summaries to the SBDH labels provided by MIMIC-SBDH: A Dataset for Social and Behavioral Determinants of Health
  • Joins NOTEEVENTS.csv to MIMIC-SBDH.csv on ROW_ID, also include attributes to relate to patient id and charttime. Importantly, TEXT is truncated down to only the Social History portion, and is joined to the sbdh labels like community-present, community-absent, education, economics, environment, alcohol, tobacco and drug.

Usage

from pyhealth.datasets import SBDHDataset
data_dir = "/path/to/data"  # path containing NOTEEVENTS.csv and MIMIC-SBDH.csv
output_dir = "/path/to/output" #assuming this exsts
    
# Initialize the dataset
dataset = SBDHDataset(
    root=data_dir, # this might take a while, since social history extract happens upon instantiation of class
)
    
# display stats, should see 7025 rows
dataset.stats()

# export extracted social history to CSV to proceed with model training on classifying text to SBDH
social_history_path = os.path.join(output_dir, "social_history.csv")
dataset.export_social_history(social_history_path)

jalengg avatar May 08 '25 04:05 jalengg