voicefilter Training setting problem

Hi,

Thank you for publishing your code! I am encountering a training problem. As an initial phase I have tried to train only on 1000 samples from LibriSpeech train-clean-100 dataset. I am using the default configuration as was published in your VoiceFilter repo. The only difference is that I used batch size of 6 due to memory limitations. Is it possible that the problem is related to the small batch size that I use?

Another question is related to the generation of the training and testing sets. I have noticed that there is an option to use a VAD for generating the training set but by default it is not used. What is the best practice? to use the VAD or not?

I appreciate your help!

Jun 27 '20 10:06 Morank88

I manage to get some progress. Now I training on data from LibriSpeech train-clean-100 and train-clean-360 and testing on train-dev-clean. After 40k steps the SDR reached only to ~5. Is it possible that it is related to the batch size that I am using (6)?

Another question - what is the the learning rate policy? Did you fixed it on 1e-3 throughout the whole training or updated it?

Thanks.

Jun 30 '20 08:06 Morank88

Here are tensorboard results:

Jun 30 '20 08:06 Morank88

Hi Morank88 - did you get any improvement on your SDR? SDR I'm getting is much worse than even you're getting:

I've tried a number of runs (firstly, I had smaller batch size to run on lesser GFX card, but the run above was at as-downloaded batch size on amazon EC2 instance with NV100). Only differences are that I made the test sample 1000 (it's 100 in the code, but comment in the Readme mentioned 1000? maybe I'll change it back to the 100 as downloaded, and run again...) - and I have some likely more up-to-date python libraries (couldn't seem to find compatible torch 1.0.1 for example) - any suggestions?

Thanks in advance,

Nat

Sep 20 '20 08:09 natbuk2

Hi,

Thank you for publishing your code! I am encountering a training problem. As an initial phase I have tried to train only on 1000 samples from LibriSpeech train-clean-100 dataset. I am using the default configuration as was published in your VoiceFilter repo. The only difference is that I used batch size of 6 due to memory limitations. Is it possible that the problem is related to the small batch size that I use?

Another question is related to the generation of the training and testing sets. I have noticed that there is an option to use a VAD for generating the training set but by default it is not used. What is the best practice? to use the VAD or not?

I appreciate your help!

hi,Can you share your settings, I run the same situation , thanks

Nov 22 '20 12:11 jianew

I have also the same question. The best result is 8 of SDR.

Jan 26 '21 09:01 yunzhongfei

Here are tensorboard results:

i meet same question with you,how did you solve this?

Mar 17 '22 10:03 zardkk