transformers.js icon indicating copy to clipboard operation
transformers.js copied to clipboard

Absolute speaker diarization?

Open flatsiedatsie opened this issue 6 months ago • 6 comments

Question

I've just managed to integrate the new speaker diarization feature into my project. Very cool stuff. My goal is to let people record meetings, summarize them, and then also list per-speaker tasks. This seems to be a popular feature.

One thing I'm running into is that I don't feed Whisper a single long audio file. Instead I use VAD to feed it small chunks of live audio whenever someone speaks.

However, as far as I can tell the speaker diarization only works "relatively", detecting speakers within a single audio file.

Is there a way to let it detect and 'sort' the correct speaker over multiple audio files? Perhaps it could remember the 'audio fingerprints' of the speakers somehow?

record_meeting

flatsiedatsie avatar Jul 30 '24 15:07 flatsiedatsie