iSTFTNet-pytorch
iSTFTNet-pytorch copied to clipboard
How about the audio quality?
Hi, thanks to the implement, the inference speed is impressive. How about the audio quality? And have you tried v2 config? Thanks in advance.
Quality is better than v1 of hifigan with less training
Hi, I trained this model several times with different scheduling and didn't get appropriate audios by inference scripts. Can you share with some training hyperparameters if this is the case? Also, what data it uses (what kHz, spectrum, etc)? I also wonder what is the difference of stft and mel with that in tacatron2?
Thank you!
@rishikksh20 ?
@SolomidHero I will check, but I think audio would be good I have train this model in 4 dataset including LJSpeech and it perform good not as good as mentioned in paper but still decent enough.
We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.
We tested it on multiple datasets and it working better than hifigan in speed as well as quality please follow same pre-processing and hyperparameter mentioned in the repo.
However, I found that the attenuation coefficient b1, b2 in the paper is different from that in the ".json" file, and the number of test sets, verification sets and training sets, as well as in the paper and in the code are inconsistent , so I don't know which version should be followed