iSTFTNet-pytorch How about the audio quality?

How about the audio quality?

Open OnceJune opened this issue 3 years ago • 6 comments

Hi, thanks to the implement, the inference speed is impressive. How about the audio quality? And have you tried v2 config? Thanks in advance.

Apr 22 '22 06:04 OnceJune

Quality is better than v1 of hifigan with less training

Apr 26 '22 05:04 rishikksh20

Hi, I trained this model several times with different scheduling and didn't get appropriate audios by inference scripts. Can you share with some training hyperparameters if this is the case? Also, what data it uses (what kHz, spectrum, etc)? I also wonder what is the difference of stft and mel with that in tacatron2?

Thank you!

Jun 20 '22 14:06 SolomidHero

@rishikksh20 ?

Jun 28 '22 20:06 SolomidHero

@SolomidHero I will check, but I think audio would be good I have train this model in 4 dataset including LJSpeech and it perform good not as good as mentioned in paper but still decent enough.

Jun 29 '22 07:06 rishikksh20

We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.

Jun 29 '22 07:06 rishikksh20

We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.

However, I found that the attenuation coefficient b1, b2 in the paper is different from that in the ".json" file, and the number of test sets, verification sets and training sets, as well as in the paper and in the code are inconsistent , so I don't know which version should be followed

Feb 29 '24 14:02 a897456

iSTFTNet-pytorch iSTFTNet-pytorch copied to clipboard

How about the audio quality?

iSTFTNet-pytorch
iSTFTNet-pytorch copied to clipboard