PyHealth
PyHealth copied to clipboard
add ann pt dataset to pyhealth
Background
ANNPTSummDataset packages PhysioNet’s Medical Expert Annotations of Unsupported Facts in Doctor-Written and LLM-Generated Patient Summaries (MIMIC-IV discharge contexts, summaries, and hallucination annotations) so PyHealth users can study unsupported facts in doctor- or LLM-written summaries across all provided splits (full, BHC, filtered 4k/600, hallucination subsets, and cleaned/improved derivatives).
Approach
- Extend the file scanner to ingest NDJSON/JSONL so those tables load natively.
- Add the ann-pt-summ YAML + dataset class and expose it via pyhealth.datasets.
- Add a regression test covering wildcard attributes and JSON ingestion.