Soshyant comments

Results 40 comments of


                                            Soshyant

How to extract pitch from an English speaking audio file?

what is the config you're using though? please show me your mel preprocessing steps

When cloning, how can we make the generated multiple audios consistent?

torch.manual_seed(some integer) right before running the inference function

Training and Inference Code

any plan for a training code?

(Q) Multi/Single Speaker different language finetune

yeah, a few individuals such as yours truly have done it.

(Q) Multi/Single Speaker different language finetune

japanese on a 21hrs dataset, single speaker.

Is there a GUI for training?

Hey. Not that I know of. this model is made of many modules, I doubt it'll be easy enough to do that.

Slow download from HuggingFace Hub (capped at 10.5 MB/s)

it goes beyond these models, it's an HF thing i believe and not Litgpt

To train a multilingual model for multiple Indian languages

Codecs and to a great extend Vocoders are usually language agnostic. So you should be fine either way. alternatively Nvidia audio codec which was released a while ago, especially the...

24khz & 48khz is indistinguishable

> From the spectrogram on my end, extending the speech from 24kHz to 48kHz appears to be working fine. > > I’m not entirely sure if, by “the outputs are...

24khz & 48khz is indistinguishable

> @yxlu-0102 to extend bandwidth properly from 24kHz to 48kHz we have to run `python inference_16k.py` or `python inference_48k.py` ? if the target sampling rate is 48, then inference_48 must...