Resemblyzer icon indicating copy to clipboard operation
Resemblyzer copied to clipboard

Clarification on embeddings training

Open davide-scalzo opened this issue 5 years ago • 4 comments

Hi @CorentinJ! Great repo! I have one question in regards to the embeddings training. Are they trained using cosine similarity, euclidian distance or some other loss?

I'm trying to use this repo in conjunction with https://github.com/wq2012/SpectralCluster but the results don't make a lot of sense , check this out https://github.com/wq2012/SpectralCluster/issues/6 and seems like it might be due to some incompatibility between the two libs if embeddings are not trained on euclidian distance. If so, is there a suggested library for a clustering algorithm where the number of speakers is not known in advance?

davide-scalzo avatar Oct 12 '20 18:10 davide-scalzo

It is cosine similarity. As for the issue with kmeans, I found this thread.

I don't know of a good way of determining speakers. I have a vague idea which I detailed in #10. One of our engineers, @adityatb, has been looking into it recently, he might be of more help

CorentinJ avatar Oct 12 '20 19:10 CorentinJ

Thanks @CorentinJ I'll look into it!

davide-scalzo avatar Oct 13 '20 09:10 davide-scalzo

Hey @davodesign84. I had tried to cluster with an unknown number of speakers. With a little tweaking I got some decent results with HDBSCAN. I had also looked into x-means, and uMap which were also interesting, and might be useful.

adityatb avatar Oct 13 '20 13:10 adityatb

Hi @adityatb that's exactly what I tried, but with little success. Do you happen to have some indication of what parameters did you use?

davide-scalzo avatar Oct 13 '20 13:10 davide-scalzo