Rishikesh (ऋषिकेश)
Rishikesh (ऋषिकेश)
@snakers4 Can we fine-tune VAD on our own data ? We have our in house segmented data just like to ask is it possible to fine tune this model or...
Fix is possible on next Pytorch release : https://github.com/pytorch/pytorch/issues/36428
Convert Audio to MelSpectrogram with 128 bins, then you can treat mel-spectrogram as an image.
Nope I haven't tried. But I am planning to do.
@nukes I trained it around 1 M and these artefacts band disappeared around 800k and quality is also good.
@nukes Yes, after 800k that periodicity decreased and most of the time artifacts are less or none. Mel Loss throws an error because the generated audio exceeds the value of...
@nukes Yeah I have same thought on that.
right now I guess iSTFTNet is best compared to both hifigan and hifi++. Currently, I am struggling to implement hifi++ architecture correctly. The author didn't share much info regarding training...
@Liujingxiu23 https://deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech doing the same. We mostly used 2 different model one for text to Mel and other Mel to vocoder just to simplify things. End to end models are...
Because converting text to wav directly is very costly task that's why we need to deal it with better ways so we generally use intermediate feat i.e. melspectrogram then text...