Rishikesh (ऋषिकेश) comments

Results 160 comments of


                                            Rishikesh (ऋषिकेश)

trafficstars

❓ VAD robustness to noise-only signals in ONNX v3 vs. v4 models

@snakers4 Can we fine-tune VAD on our own data ? We have our in house segmented data just like to ask is it possible to fine tune this model or...

Issue when applying AMP on LSTM

Fix is possible on next Pytorch release : https://github.com/pytorch/pytorch/issues/36428

How to get the audio image?

Convert Audio to MelSpectrogram with 128 bins, then you can treat mel-spectrogram as an image.

multiband-hifigan

Nope I haven't tried. But I am planning to do.

multiband-hifigan

@nukes I trained it around 1 M and these artefacts band disappeared around 800k and quality is also good.

multiband-hifigan

@nukes Yes, after 800k that periodicity decreased and most of the time artifacts are less or none. Mel Loss throws an error because the generated audio exceeds the value of...

multiband-hifigan

@nukes Yeah I have same thought on that.

which is better for TTS, hifigan vs hifi++?

right now I guess iSTFTNet is best compared to both hifigan and hifi++. Currently, I am struggling to implement hifi++ architecture correctly. The author didn't share much info regarding training...

End-to-End model or trainning of Text to Speech

@Liujingxiu23 https://deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech doing the same. We mostly used 2 different model one for text to Mel and other Mel to vocoder just to simplify things. End to end models are...

End-to-End model or trainning of Text to Speech

Because converting text to wav directly is very costly task that's why we need to deal it with better ways so we generally use intermediate feat i.e. melspectrogram then text...