audapolis Speaker diarization with pyannote.audio?

I am the creator of pyannote.audio speaker diarization toolkit.

I understand that you went with @josepatino's PyBK because of its speed but I'd love to see pyannote.audio pretrained pipeline integrated into audapolis.

Would that be of interest to you? I'd love to help in some way!

Jun 02 '22 12:06 hbredin

Hey, thanks for reaching out. We would be interested in integrating this 100%. When I started looking at speaker diarization I also noticed pyannote.audio, but as there was no pretrained pipeline at the time we decided against it.

Jun 02 '22 13:06 pajowu

Do you think it would be possible to extend the SpeakerDiarization-pipeline to not only report the individual steps of the pipeline via the hook, but also the progress within certain steps? This would be a huge benefit for us

Jun 02 '22 13:06 pajowu

I have been meaning to add this kind of progress hook for the online demo but it never really reached the top of my priority list.

Those are the two steps that really make most of the processing time:

speaker segmentation that relies on the Inference class that already has some kind of support for progress hook
speaker embedding (which is basically a for loop that should easily be adapted to support a progress hook.

Jun 02 '22 14:06 hbredin

FYI, I just released a much faster/more accurate version of pyannote.audio speaker diarization pipeline. It still does not expose the progress of the individual steps but this is now on my TODO list (though with no ETA).

Jul 21 '22 07:07 hbredin

Wow, I just tried it (and opened https://github.com/pyannote/pyannote-audio/pull/1185/files for the progress). The results are really impressive 😍

Dec 08 '22 19:12 pajowu

I started integrating it and stumbled upon a problem which I'm currently not sure how to solve, so if you have any idea @hbredin, I would be very interested: audapolis currently works on the assumption that there is only one "speaker" at any time. pyannote-audio on the other hand supports multiple speakers at the same time. It therefor produces overlaps between the speakers.

Since changing audapolis to support multiple speakers is too much for now, I'm trying to "flatten" the output of pyannote-audio to 1 speaker at a time. Do you have a suggestion on how to do that properly?

Dec 17 '22 02:12 pajowu

Nothing built in pyannote comes to mind.

You'd have to postprocess the pyannote.core.Annotation instance returned by the pipeline:

remove any segment fully contained by a larger segment

[------A-------]   ==> [------A-------]
    [--B--]

split partially overlapping segments in two halves

[----A----]     ==> [----A--]
     [----B----]            [--B----]

Or you could clip the output of the speaker counting step to be at most 1.

count.data = np.clip(count.data, 0, 1)

should do the trick...

Dec 18 '22 14:12 hbredin

audapolis audapolis copied to clipboard

Speaker diarization with pyannote.audio?

audapolis
audapolis copied to clipboard