diart icon indicating copy to clipboard operation
diart copied to clipboard

add support of pyannote segmentation 3.0 (experience sharing)

Open ywangwxd opened this issue 11 months ago • 3 comments

Hi, if anyone like me is working on the feat/diart-asr branch and want to add support of pyannote segmentation 3.0. Here is what I have done.

You only need to make some changes in diariazation.py. Segmentaiton 3.0 output activation of seven classes, instead of three (for details please refer to the paper) . What's more the activation values are those before a softmax transformation. So I have simply added a softmax transformation and take only the activations of three speakers, ignoring the other four labels. In this way, it is almost the same to the old version of segmentation. But be aware that this will miss overlapping speech segments.

diart

ywangwxd avatar Dec 20 '24 09:12 ywangwxd

Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()

juanmc2005 avatar Dec 21 '24 16:12 juanmc2005

Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()

Thank you, this is the benefit for my sharing. Someone else may tell me a better solution:-)

ywangwxd avatar Dec 23 '24 02:12 ywangwxd

Hey @ywangwxd, just FYI you can use the Powerset class from pyannote.audio to do this without sacrificing overlapping speech. In particular, take a look at the method called to_multilabel()

After taking a detailed look, I switched to making changes on models.py instead of the upper level diariazation.py. Here is the screenshot of the diff. I have referred to those codes in version 0.9.1. Just to confirm that in this way, any overlapping speech will be labelled as a single speaker which has the most loudly voice, right?

diart

ywangwxd avatar Dec 23 '24 07:12 ywangwxd