langtest
langtest copied to clipboard
Implement MTS-Dialog-Based Clinical Summary Evaluation
trafficstars
Description:
This issue aims to integrate the MTS-Dialog dataset into the LangTest framework, enabling clinical summarization evaluation. The goal is to support structured, medically accurate summarization assessments using this domain-specific benchmark.
Tasks:
- Add a data loader/parser for the MTS-Dialog dataset.
- Map MTS-Dialog fields to LangTest's summarization task schema.
- Implement support for evaluating structured summaries (e.g., SOAP/EMR format).
- Ensure alignment with evaluation criteria such as factual completeness, hallucination detection, and clinical relevance.
Acceptance Criteria:
- LangTest can load and process MTS-Dialog samples.
- Evaluation metrics specific to clinical summarization are supported.