deep-clustering
deep-clustering copied to clipboard
Convergence issue
Hi Haoran Zhou,
did you fix convergence issue that you mentioned in this discussion https://github.com/jcsilva/deep-clustering/issues/1? I have tested your implementation on different amount of samples from wsj0 dataset and it stucks in a very bad local minima even for small training datasets (1000 utterances) in my case. I did not change your code, just tried out different optimizers but it did not bring any improvements.
@isklyar
The loss value was still very big in the end, but it shouldn't be a problem, because the loss is the affinity matrix l2 loss, which of 12900 * 12900 size (166,410,000 elements). When I trained the net on my TIMIT 30 hour dataset (with reverberation), the loss was something like this, which looks weird, not like the loss value we saw on other tasks. (About 1 day training on Titan X GPU)
Still about 2,400,000! However, it means only about 2 / 166 of the affinity values are not quite right approximately.
Then I tried different optimizers but the loss was still big (Got about 1,000,000 on some datasets), so I just moved on and wrote test script using an audio sample. The separation result turns out to be great! The only problem is that when you use a test sample, the permutation between chunks of frames is hard to decide. When you assign frames with closer cluster centroids together, quite often you may get them concatenated using the wrong permutation, that's why they used oracle permutation in the very first DC paper.
Guess the convergence shouldn't be a problem. a. You may want to test your trained model on a speech sample directly, the results should be great for different gender mix only after a few hours and things will get better after ten hours or so for same gender mix. b. Use a dataset at least 10h, then test the model. If you use a small dataset, because the mix up is random and the model doesn't generalize well, you may even get poor results using samples that was used to generate your training set because the mix may not appear in your training set.
Good luck!
Excuse me, I think my question has nothing to do with the current issue, but may I ask how did you get the WSJ0 dataset, I opened the website of LDC only to find some instructions and no download links @isklyar
@BigeyeDestroyer Sorry I can't help you with that, I used TIMIT and haven't tried WSJ yet.
Thank you, I emailed the author of deep-clustering and he told me that the WSJ is not freely available. It will cost about $1500 to buy the dataset from LDC. So I am also experimenting on other datasets.