pyannote-audio Wespeaker embeddings question

If I wanted to use one of the larger wespeaker models - say 293 - would I just download the .pt file and point to it in the config.yaml?

Dec 14 '23 16:12 picheny-nyu

It is a tiny bit more complex than that.

See this script that does most of the job.

If you work on this, would be nice to share them on Huggingface, taking this repo as example.

Dec 15 '23 13:12 hbredin

I have never uploaded a model to huggingface before. Is there some way I can give it a similar name - like pyannote/wespeaker-voxceleb-resnet293-LM? If I understand correctly, the way the code works is that it first scans for the keyword "pyannote" in the model name, so another option I assume would be to call it "picheny/pyannote-wespeaker-voxceleb-resnet293-LM". Another concern is that the use of these embeddings is not giving me any improvement on my task (relative to the 34M version). That could just be life, or I might have messed something up.......

Dec 19 '23 03:12 picheny-nyu

It should be fine with picheny/wespeaker-voxceleb-resnet293-LM since it will end up using this branch of the code:

https://github.com/pyannote/pyannote-audio/blob/66dd72bb2b807aaf6d011c89678d85b51fb3b859/pyannote/audio/pipelines/speaker_verification.py#L764-L768

Also, in my speaker diarization experiments, larger models did not bring any significant (or consistent) improvement either. That is why I sticked with the ResNet34 version for pyannote/speaker-diarization-3.1.

Larger models might help for speaker verification, though.

Dec 19 '23 08:12 hbredin

OK I will try to put it out there then :-).

Dec 19 '23 15:12 picheny-nyu

@hbredin Thanks for the awesome work!

I want to ask if I need to change the clustering threshold when using wespeaker-voxceleb-resnet293-LM? If so, could you please share the experimental threshold you used when testing the resnet293-LM?

May 20 '24 19:05 akmalmasud96

Yes, you would need to optimize thresholds for each version of the embedding network. However, I did not keep track of the optimized thresholds, sorry.

May 21 '24 06:05 hbredin

@hbredin Thanks for the quick response. Can you please point me out some documentation or some guidelines to set the threshold value? And using which dataset, I should perform benchmarking.

May 21 '24 06:05 akmalmasud96

Grid search should be fine. For benchmarking, I guess you’d have to use data similar to the expected test/production data.

May 21 '24 07:05 hbredin

@hbredin Currently, I dont have any annotated data. Can you inform that the existing setup is on which dataset? The current configurations are working fine on my data.

May 21 '24 11:05 akmalmasud96

pyannote-audio pyannote-audio copied to clipboard

Wespeaker embeddings question

pyannote-audio
pyannote-audio copied to clipboard