PyHealth
PyHealth copied to clipboard
Add Synthetic EHR Evaluation Example - Will Hunnius Jiesen Zhang
Synthetic EHR Data Evaluation Example
Description
This PR adds a comprehensive example demonstrating how to evaluate synthetic Electronic Health Record (EHR) data quality using PyHealth.
Contribution Type
- [x] New example/use case
Based On
Lin et al. (2025) "A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs" - JMLR CHIL 2025 https://arxiv.org/abs/2504.14657
Files Added
-
examples/synthetic_ehr_evaluation.py- Main example script
Features
- Fidelity Metrics: KL divergence for distribution matching
- Utility Metrics: TSTR (Train-Synthetic, Test-Real) evaluation
- Privacy Metrics: Membership inference attack evaluation
- Visualizations: Distribution comparisons, ROC curves, summary charts
Usage
evaluator = SyntheticDataEvaluator(target_column="mortality")
results = evaluator.evaluate(real_data, synthetic_data)
print(f"TSTR AUC: {results['utility']['tstr_auc']:.3f}")
Course Information
- Course: CS598 DL4H - Deep Learning for Healthcare
- University: University of Illinois Urbana-Champaign
- Authors: Will Hunnius, Jiesen Zhang
Checklist
- [x] Code follows PEP8 style guidelines
- [x] Added docstrings with Google style
- [x] Example runs without errors
- [x] Notebook is well-documented