diart
diart copied to clipboard
Add speaker-aware transcription
trafficstars
Depends on #144
This PR adds a new SpeakerAwareTranscription pipeline that combines streaming diarization and streaming transcription to determine "who says what" in a live conversation. By default, this is shown as colored words in the terminal.
The feature works as expected with diart.stream and diart.serve/diart.client.
The main thing preventing full compatibility with diart.benchmark and diart.tune is the evaluation metric.
Since the output of the pipeline is annotated text with the format: [speaker0]Hello [speaker1]Hi, the metric diart.metrics.WordErrorRate will count labels as insertion errors.
Next steps: implement a SpeakerWordErrorRate that computes the (weighted?) average WER across speakers.
Changelog
TBD