FastVocoder icon indicating copy to clipboard operation
FastVocoder copied to clipboard

Multiband Architecture

Open Rongjiehuang opened this issue 3 years ago • 7 comments

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

Rongjiehuang avatar Jul 25 '21 12:07 Rongjiehuang

You can refer https://github.com/xcmyz/FastVocoder/blob/main/bin/synthesize.py#L79

xcmyz avatar Jul 28 '21 12:07 xcmyz

hi, I try and find that the trick could not solve this problem. Because of the random value of synthesized sound in two synthesis, this minus could be "over". E.g., in some place a clearer segment (0.02, 0.05, 0.06) - a bias (0.05, 0.05, 0.02) = (-0.03, 0, 0.04), which means that the first place gets worse.

Rongjiehuang avatar Jul 28 '21 15:07 Rongjiehuang

hi, I try and find that the trick could not solve this problem. Because of the random value of synthesized sound in two synthesis, this minus could be "over". E.g., in some place a clearer segment (0.02, 0.05, 0.06) - a bias (0.05, 0.05, 0.02) = (-0.03, 0, 0.04), which means that the first place gets worse.

In my case, it can solve the checkerboard artifacts problem. Maybe you can use some low-quality speech to train the model, like aishell3. I combine biaobei data and aishell3 in the training data, this problem can be solved. Besides, you can try u-law algorithm in different band and make normalization in different band to fix the problem.

xcmyz avatar Jul 30 '21 15:07 xcmyz

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

Hi, I also have encountered with the straight line at a specific frequency when developing similar multiband architecture.for example multiband Mel-Gan.Do you have the trick to solve now?

RuqiaoLiu avatar Oct 15 '21 09:10 RuqiaoLiu

Hi author, I have found the notes as "the generated audio has interference at a specific frequency" in this repo. I have encountered with the straight line at a specific frequency when developing similar multiband architecture, and I wonder if such phenomenon is the one you mentioned? And do you have some advice or solutions? Thanks. audio

Hi, I also have encountered with the straight line at a specific frequency when developing similar multiband architecture.for example multiband Mel-Gan.Do you have the trick to solve now?

There are three main general approaches for these constant lines:

  1. train for more steps.
  2. add discriminator (work in GAN based waveform generation)
  3. after PQMF, the full band waveforms pass through an additional conv layer.

Rongjiehuang avatar Oct 15 '21 11:10 Rongjiehuang

Is better than hifigan??

ysujiang avatar Feb 22 '22 09:02 ysujiang

@Rongjiehuang Thanks,the last advice works for me!

HaiFengZeng avatar Feb 08 '23 09:02 HaiFengZeng