iSTFTNet-pytorch icon indicating copy to clipboard operation
iSTFTNet-pytorch copied to clipboard

How about the audio quality?

Open OnceJune opened this issue 3 years ago • 6 comments

Hi, thanks to the implement, the inference speed is impressive. How about the audio quality? And have you tried v2 config? Thanks in advance.

OnceJune avatar Apr 22 '22 06:04 OnceJune

Quality is better than v1 of hifigan with less training

rishikksh20 avatar Apr 26 '22 05:04 rishikksh20

Hi, I trained this model several times with different scheduling and didn't get appropriate audios by inference scripts. Can you share with some training hyperparameters if this is the case? Also, what data it uses (what kHz, spectrum, etc)? I also wonder what is the difference of stft and mel with that in tacatron2?

Thank you!

SolomidHero avatar Jun 20 '22 14:06 SolomidHero

@rishikksh20 ?

SolomidHero avatar Jun 28 '22 20:06 SolomidHero

@SolomidHero I will check, but I think audio would be good I have train this model in 4 dataset including LJSpeech and it perform good not as good as mentioned in paper but still decent enough.

rishikksh20 avatar Jun 29 '22 07:06 rishikksh20

We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.

rishikksh20 avatar Jun 29 '22 07:06 rishikksh20

We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.

However, I found that the attenuation coefficient b1, b2 in the paper is different from that in the ".json" file, and the number of test sets, verification sets and training sets, as well as in the paper and in the code are inconsistent , so I don't know which version should be followed

a897456 avatar Feb 29 '24 14:02 a897456