autovc icon indicating copy to clipboard operation
autovc copied to clipboard

The output encoder

Open nkcdy opened this issue 5 years ago • 6 comments

it seems that the output encoder should be an extra module with the same structure and the same weight with the input encoder. But it is very difficult to get convergence in my training. Correct me if I am wrong.

nkcdy avatar Jul 30 '19 11:07 nkcdy

decoder and encoder are different, they don't share weights

auspicious3000 avatar Aug 02 '19 10:08 auspicious3000

what I mean is the content encoder for the output signal, not the decoder.

nkcdy avatar Aug 02 '19 13:08 nkcdy

yes, just feed the reconstruction back into the encoder

auspicious3000 avatar Aug 02 '19 18:08 auspicious3000

@nkcdy Have you solved your speaker feature extraction problem?How you did it.

liveroomand avatar Aug 09 '19 07:08 liveroomand

@nkcdy I am also implementing this paper and find that I have encountered many problems like you. Can I communicate with you?I've been able to voice conversion , but there's still a certain amount of background noise.

liveroomand avatar Aug 09 '19 07:08 liveroomand

@liveroomand @nkcdy Did either of you finally figure out what the secret sauce is for training a version that converges to 0.0001 and yields audio of a similar quality to what is produced by the pretrained? I get very noisey conversions at 100k iterations, but after 1M iterations, I get conversions of similar quality to those examples produced by the pre-trained network provided. I have also been using the training code that was recently uploaded to the repo 3 months ago.

Trebolium avatar Dec 12 '20 19:12 Trebolium