deep-voice-conversion Net2 can not convergence

Even though I have trained net1 which achieved over 70% acc, then I loaded the pretrained net1 to train net2, however, whatever I do, net2 can not convergence. BTW, I decreased the train2 batch_size(32) to 16, anything else is unchanged. Here is the net2 training loss. Consequently, the net2 synthesis a fuzzy sound, cause the synthesising loss is pretty high, I wonder if anyone had suffered similar problems?

Jan 10 '18 02:01 Lucklady

Hmm, I have not seen that...couple of things:

-What is your training data? Are you using the arctic dataset? -If you are not using the arctic dataset, are your .wav files using the 16kHz sample rate? I'm not sure if it would cause a problem like you are seeing, but using a different sample rate could potentially cause issues.

Jan 11 '18 22:01 jswilson

Thanks for reply! Yes, the default sample rate is 16kHz. In addition to decreasing the batch_size and altering training files' directories, I keep everything else unchanged. I trained the Net 1 for almost 5k epoch, and the Net1 achieve considerable 70% acc, even though I trained the Net2 for 100k step, the training loss still bound between 2.0 and 5.0, and it hardly can not convergence. The author mentioned he used one cbhg module in train2(removed the mel part cbhg module) #3 , I'm not sure if the simplified model matters.

Jan 12 '18 00:01 Lucklady

OK! That is strange; and what is your dataset for train2? Are you using the arctic dataset, or is it something else?

I have a proprietary dataset I'm using for train2 using the model as it exists in the repository, and I can report it does seem be working: decreasing loss, generally improved results.

Jan 12 '18 15:01 jswilson

Thanks for your reply! Yes, I used the provided arctic slt dataset. Maybe I should replace the provided dataset with a proprietary dataset. BTW, I wonder if you can train the model in the repository well with the provided dataset?

Jan 13 '18 01:01 Lucklady

Hmm, I'm not sure; I imagine it would work with the provided dataset, but I have not tried it myself yet, sorry!

Jan 15 '18 19:01 jswilson

@Lucklady Hello! which dataset you are using now? Look forward to your kind reply!

Jan 31 '18 08:01 coasxu

Emmm, Actually, I did not try a new dataset cause I haven't too much time for that. If you wanna try a new dataset, I suggest you can use politician's speech audios, pre-dealing with Voice Activity Detection and dividing them into piece about 5~6 seconds and feed them to Net2. I hope you could make it with a new dataset!

Feb 01 '18 13:02 Lucklady

@Lucklady Thanks for your reply! Could you tell me how long is your longest training time in train2?

Feb 02 '18 06:02 coasxu

@jswilson Where can I find that proprietary dataset you're talking about?

Apr 01 '18 18:04 VictoriaBentell

deep-voice-conversion deep-voice-conversion copied to clipboard

Net2 can not convergence

deep-voice-conversion
deep-voice-conversion copied to clipboard