deep-clustering icon indicating copy to clipboard operation
deep-clustering copied to clipboard

Question about cluster permutation

Open ghost opened this issue 6 years ago • 2 comments

Hi! @zhr1201 Thank you for your great job! I ran your code successfully and the separation performance was good! But I still got some question, I would appreciate a lot If you could take a look: In audio_test.py, It seems that you used a bool ( cor[1] > cor[0]) to decide whether to change the order of cluster1 and cluster2. But the definition of the variable "cor" really confused me. I wonder why you choose the inner product of Clusters to represent the "rate of persistence"(or something like that). I didn't find it in original paper, Did I miss something?

ghost avatar Apr 21 '19 08:04 ghost

Glad to know you like the rep! ( cor[1] > cor[0]) is not from the original paper. it is an intuitive move which might fail in some examples. I assume that the centroids of the cluster of the tf-bins of the same person would be closer. But there is no guarantee for that because the training process does not pull those centroids closer for the same person. I think concatenating the frames using the right permutation is actually another problem but I didn't dive deeper into that.

zhr1201 avatar Apr 22 '19 03:04 zhr1201

Thanks for the reply! I found something about that in the original paper(2016). the author called this problem as "permutation problem", besides he mentioned that the oracle permutation is the one which minimizes the ll_distance between the output and sources. I tried several alternatives to replace that function. However, the best result came from a large Frame_Per_Size that covers the whole utterance. It seems to be a common choice to tackle this "permutation problem"(or so-called "speaker tracing problem") :( Have you read the paper titled "Permutation Invariant Training"(YU 2017)? where the author described the "label ambiguity problem" as "Permutation problem" too, though from my perspective those two are definitely different, The chaotic reference of "label permutation problem" and "permutation \ speaker tracing problem" really confused me for a few weeks. But right now I think a got a bit of that. (Please tell me if I'm still wrong) Your code helps a lot! thanks :)

ghost avatar Apr 23 '19 08:04 ghost