PyHealth icon indicating copy to clipboard operation
PyHealth copied to clipboard

add ann pt dataset to pyhealth

Open machira opened this issue 1 month ago • 0 comments

Background

ANNPTSummDataset packages PhysioNet’s Medical Expert Annotations of Unsupported Facts in Doctor-Written and LLM-Generated Patient Summaries (MIMIC-IV discharge contexts, summaries, and hallucination annotations) so PyHealth users can study unsupported facts in doctor- or LLM-written summaries across all provided splits (full, BHC, filtered 4k/600, hallucination subsets, and cleaned/improved derivatives).

Approach

  • Extend the file scanner to ingest NDJSON/JSONL so those tables load natively.
  • Add the ann-pt-summ YAML + dataset class and expose it via pyhealth.datasets.
  • Add a regression test covering wildcard attributes and JSON ingestion.

machira avatar Dec 07 '25 14:12 machira