voicefixer_main icon indicating copy to clipboard operation
voicefixer_main copied to clipboard

questions for vocoder

Open LqNoob opened this issue 3 years ago • 1 comments

Hi, @haoheliu. Thank you for your awesome work.

  1. After read code on the vocoder part, I found that there is only a pre-trained model and no training steps. Why is there no implementation of this part ? And under what circumstances is the pre-trained model obtained and how is its performance ?
  2. The vocoder part in the original TFGAN paper does not include the subband discriminator(there is also no implementation of this part). Because I did not see the relevant interpretation in the paper, what help or impact does the subband discriminator have on the model ?

If I can get an answer, it will help me a lot. Thank you.

LqNoob avatar Dec 15 '21 09:12 LqNoob

Hi @LqNoob, I'm not sure if you still need the answer or not. Many apologize for the late reply. These are good questions.

  1. The implementation of TFGAN is confidential as the codebase of ByteDance, so I cannot open-source it. If you are interested you can refer to this repo, which has a similar implementation as ours. To achieve speaker-independent, you need to use at least 1000+ speakers in the training dataset.
  2. We use a subband discriminator to enhance the discriminative power of the GAN. We believe this can help TFGAN achieves a better vocoding result.

Thanks

haoheliu avatar Apr 11 '22 14:04 haoheliu