diart icon indicating copy to clipboard operation
diart copied to clipboard

Add speaker-aware transcription

Open juanmc2005 opened this issue 2 years ago • 6 comments
trafficstars

Depends on #144

This PR adds a new SpeakerAwareTranscription pipeline that combines streaming diarization and streaming transcription to determine "who says what" in a live conversation. By default, this is shown as colored words in the terminal.

The feature works as expected with diart.stream and diart.serve/diart.client. The main thing preventing full compatibility with diart.benchmark and diart.tune is the evaluation metric. Since the output of the pipeline is annotated text with the format: [speaker0]Hello [speaker1]Hi, the metric diart.metrics.WordErrorRate will count labels as insertion errors.

Next steps: implement a SpeakerWordErrorRate that computes the (weighted?) average WER across speakers.

Changelog

TBD

juanmc2005 avatar Apr 26 '23 12:04 juanmc2005