cdp-backend
cdp-backend copied to clipboard
Speaker classification
Feature Description
Backend issue for the relevant roadmap issue
Adding speaker classification to CDP transcripts. This could be through a script/class that retroactively attaches the speaker name to a transcript that already has speaker diarization enabled. Prodigy can be used for annotating the training data.
Use Case
With speaker classification we can provide transcripts annotated with the speaker. This can be used in many ways such as through a script or github action
Solution
Very high level idea would be to:
- Use GCP's built-in speaker diarization to separate the speakers. We could also create our own audio classification model. We could also use something like Prodigy to annotate the data, but I believe they have their own diarization/transcription models as well.
- Figure out how to add the classified speaker names to the diarized transcript. I'm not sure if GCP allows you to provide any training data, but from what I could tell they only separate the speakers, but the models don't take in training data to label them.
A bigger picture breakdown of all the major components can be found on the roadmap issue under "Major components".