PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

Add Synthetic EHR Evaluation Example - Will Hunnius Jiesen Zhang

Open willhunnius opened this issue 1 month ago • 0 comments

Synthetic EHR Data Evaluation Example

Description

This PR adds a comprehensive example demonstrating how to evaluate synthetic Electronic Health Record (EHR) data quality using PyHealth.

Contribution Type

  • [x] New example/use case

Based On

Lin et al. (2025) "A Case Study Exploring the Current Landscape of Synthetic Medical Record Generation with Commercial LLMs" - JMLR CHIL 2025 https://arxiv.org/abs/2504.14657

Files Added

  • examples/synthetic_ehr_evaluation.py - Main example script

Features

  • Fidelity Metrics: KL divergence for distribution matching
  • Utility Metrics: TSTR (Train-Synthetic, Test-Real) evaluation
  • Privacy Metrics: Membership inference attack evaluation
  • Visualizations: Distribution comparisons, ROC curves, summary charts

Usage

evaluator = SyntheticDataEvaluator(target_column="mortality")
results = evaluator.evaluate(real_data, synthetic_data)
print(f"TSTR AUC: {results['utility']['tstr_auc']:.3f}")

Course Information

  • Course: CS598 DL4H - Deep Learning for Healthcare
  • University: University of Illinois Urbana-Champaign
  • Authors: Will Hunnius, Jiesen Zhang

Checklist

  • [x] Code follows PEP8 style guidelines
  • [x] Added docstrings with Google style
  • [x] Example runs without errors
  • [x] Notebook is well-documented

willhunnius avatar Dec 04 '25 03:12 willhunnius