wespeaker icon indicating copy to clipboard operation
wespeaker copied to clipboard

About implement of Normalized Maximum Eigengap Spectral Clustering(NME-SC) for Speaker Diarizaton

Open Zhubisong opened this issue 3 months ago • 2 comments

Thank you for uploading pre-trained ECAPA-TDNN model.

For speaker diarization, the spectral clustering algorithm used by wespeaker uses the p-neighbor binarization scheme, and "p" should be choosed by people. I want to know how to choose "p" for different dataset(such as AMI, DIHARD, MagicData, Callhome or AISHELL4), 0.01 is ok?

In "Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap", author proposed NME-SC, the algorithm free us for choosing "p". I want to know if wespeaker can Implement the algorithm?

Zhubisong avatar Mar 11 '24 06:03 Zhubisong

  1. I think there isn't a fixed "p" can perform well in all datasets as you mention, which is exactly why the NME-SC algorithm is proposed ans works. In my experience, "p" in [0.01, 0.05] would get a modest result. Also, you can refer to our setup in our diarization recipe.
  2. This algorithm is essentially enumerating the "p" value and find the best in the dev set, which is costly in computation. You can easily implement it from our diarization codes by adding a for loop of "p". Maybe you can contribute the codes when you finish it!

JiJiJiang avatar Mar 11 '24 13:03 JiJiJiang

This git repo may also help: Auto-Tuning-Spectral-Clustering

JiJiJiang avatar Mar 22 '24 12:03 JiJiJiang