ParallelWaveGAN
ParallelWaveGAN copied to clipboard
Breakpoint problem
hi @kan-bayashi , When using this project to train multi-band melgan vocoder, there will always be breakpoints in the generate audio! Trying to modify the kernal size of the first layer or the kernel size of the upsample laryers both This phenomenon cannot be eliminated.
When I listen to the audio produced by the released model, there is a similar problem, For details, please refer to the attachment identification
LJ050-0033_gen.zip
Is there any way to eliminate it, thank you!
Sorry for the late reply. Unfortunately, I have no clear idea to solve this problem. I wrote some comments:
- Recently I fixed PQMF problem. It may be affected to the quality.
- How about the increasing
stacks
to expand receptive field? - Is the breakpoint happened at the same position? (e.g, time or specific phoneme or etc...)
Thank you for your
-
new PQMF was not resolved this problem!,
-
Breakpoints appear randomly
-
increasing stacks i will try!
hi @kan-bayashi ! increasing stacks not resolved this problem! I suspect that the deconvolution kernel is responsible for this problem!
Thank you for sharing your experiments. In #216, @LLianJJun suggested the better config. It is worthwhile to try.
@kan-bayashi Thanks!I will try this!
I also find this issue in PWG using sing data, anybody soled this problem?
@zpcoftts! I try to change the size of convolution kernel, deepen the number of stack layers, modify the discriminant function, remove the weight normalize, etc., which can not solve this problem
@maozhiqiang Have you tried increasing "batch_max_steps"?
@LLianJJun ! Not yet, Does it affect the sound quality? My config is as follows
sample_rate=16000, batch_max_steps=8000
@maozhiqiang
I'm not sure.
however, I have a breakpoint in the continuous section of the voiced sound component.
so, the cause of the problem is suspected to be the receptive field or speech segment size.
I will share the results after the experiment.
@LLianJJun thanks!
I changed the receptive field by changing stacks=5
, But the problem remains
I also meet this issue, but it does not appear in pretrain model audios, only appears after dis net is introduced
@maozhiqiang I'm not sure. however, I have a breakpoint in the continuous section of the voiced sound component. so, the cause of the problem is suspected to be the receptive field or speech segment size. I will share the results after the experiment.
hi @LLianJJun. Have you solved this problem?
@maozhiqiang @LLianJJun @OnceJune @kan-bayashi @Alexey322 Hi all, have you solved this probelm well? This phenomenon also appears in my data set,I have tried the following methods, but none of them could solve this problem well.
- increase the frame_length and frame_shift setting for multi-resoultion stft loss
- employ big generators and big discriminators
- finetuned vocoder using force-align mel from Taco2 model Any suggestions for me? Many thanks.
@GuangChen2016 Hi, I'm now using hifigan with 200w+ steps' training, then finetune with gta, which has no breakpoint inside phoneme.
@OnceJune Thanks for your reply. Yeah, hifigan is much better and almost no breakpoint inside phoneme. However, it's much slower than melgan. What's your configs like for hifigan? Such as upsample_scales for genrator and hopsize. And which training script did you use? Did you use the config_v1.json and training scripts here or modify anythings? Thanks again.
@GuangChen2016 hifigan v1 has good audio quality, and it is large and slow. I used v2, with hop size 256, and the infer speed is good to me. You can also make hifigan multiband.
@OnceJune Yeah, hifigan v1 has good audio quality and no no breakpoint, but when I moved to hifigan v2, the breakpoint appears sometimes. Which repo do you use to train your hifigan v2 models? This repo or the official one? By the way, did you modify or add additional loss to improve the results for hifigan v2?
@GuangChen2016 The official one, I didn't modify any layers or add any loss.
@OnceJune Thanks you very much, I also used the official one. Could you send me some samples of hifigan v2?
@GuangChen2016 Sorry, I'm using a commercial dataset. How many steps did you train with hifigan? hifigan might need 150w+ steps to get a stable quality.
@OnceJune Many thanks. I have trained the model for 200W steps but I haven't finetuned with gta for hifigan v2 now. Or could you describle the quality compared with LPCNet and Melgan-stft? also the robustness.