HiFi-GAN icon indicating copy to clipboard operation
HiFi-GAN copied to clipboard

multiband-hifigan

Open nukes opened this issue 4 years ago • 10 comments

Hi, Did you try the idea multiband hifigan?

nukes avatar Feb 23 '21 09:02 nukes

Nope I haven't tried. But I am planning to do.

rishikksh20 avatar Feb 24 '21 04:02 rishikksh20

Hi any update on this ? I try this idea. but the result is not good.I use fullband stft + subband stft + mel + adv loss combination and the predicted wave has artifact in a specific frequency bin. After 400K step, this artifact still does not disappear. I want to know if you still meet the same issue and whether you still use the mel loss as part of the generator loss ?
image

nukes avatar Mar 02 '21 08:03 nukes

@nukes I trained it around 1 M and these artefacts band disappeared around 800k and quality is also good.

rishikksh20 avatar Mar 02 '21 08:03 rishikksh20

Good news! what i obeserve is that this artifacts appears periodically . Something like disappears in 300k, then appeears in 310k. Did you observe the same pattern ? And, do you use mel loss ?

nukes avatar Mar 02 '21 09:03 nukes

@nukes Yes, after 800k that periodicity decreased and most of the time artifacts are less or none. Mel Loss throws an error because the generated audio exceeds the value of 1 which creates problem when we convert wav to mels for error calculation, its not often but sometimes it's throw an error mostly around 20k to 40k steps so I start training with mel loss, adv loss, STFT and sub STFT losses but around 20k when mel loss errors pops up I just comment mel loss and for remaining training I only used STFT, sub STFT losses with Adv loss.

rishikksh20 avatar Mar 02 '21 09:03 rishikksh20

Got it! i am still training my model and i will let you know the result once to 800k.

nukes avatar Mar 02 '21 11:03 nukes

Also do you think it is worthy to try MultiStepLR learning rate scheduler just like mb-melgan? I saw the subband loss fluctuates dramatically while the mb-melgan learning curve is much more smooth and the periodical artifact disappears around 300k-400k.

nukes avatar Mar 02 '21 12:03 nukes

@nukes Yeah I have same thought on that.

rishikksh20 avatar Mar 02 '21 12:03 rishikksh20

Hi i try the idea "mb-hifigan", but the result is not good. At the high-frequncy bins, the structure is quite blurry, while the org-hifigan has a much better performance at high-freq bins. did you see the see result? org-hifigan: image mb-hifigan: image

nukes avatar Mar 15 '21 03:03 nukes

hi,What's the result of mb-hifigan now?Is it better now?

ysujiang avatar Dec 01 '21 03:12 ysujiang