Michael Conrad

Results 23 comments of Michael Conrad

Have been working on TTS stuff and haven't had time to try and figure it out.

I've also tried the following and now I'm getting "RuntimeError: input.size(-1) must be equal to input_size. Expected 80, got 386": ```python mel_npy: array = numpy.load(os.path.join(cwd, "tmp.npy")) mel_npy = mel_npy.reshape((1, mel_npy.shape[0],...

I finally figured out it needed a transpose, but, the generated wav is all silence? ```python mel_npy: array = numpy.load(os.path.join(cwd, "tmp.npy")).transpose() mel_npy = mel_npy.reshape((1, mel_npy.shape[0], mel_npy.shape[1])) mel_tensor: Tensor = torch.tensor(mel_npy).to("cuda")...

The following seems to work. Definitely different sounding... [universalvocoding.zip](https://github.com/bshall/UniversalVocoding/files/7355747/universalvocoding.zip) ```python mel_npy: array = numpy.load(os.path.join(cwd, "tmp.npy")).transpose() top_db = 80 mel_npy = numpy.maximum(mel_npy, -top_db) mel_npy = mel_npy / top_db mel_tensor: Tensor =...

I'm prepping to run another test with a fork of it. I'm looking in https://github.com/CherokeeLanguage/Cherokee-TTS/blob/master/params/params.py and trying to figure out what to change. I see there is a normalize setting....

I suppose this question also applies to the aligner.

I "augmented" my data with the following script: (https://github.com/CherokeeLanguage/cherokee-audio-data/blob/main/create_augmented.py) Adding in the longer combined sequences has greatly enhanced the quality of the output and so far the loss of syllables...

Sorry I haven't responded in a while. Family and health issues. I tracked down part of the "over fitting" to be caused by the aligner. It seems that the aligner...

FYI: Here is a link showing usage of IMS-Toucan to help with language preservation by the creation of teaching materials. https://www.youtube.com/channel/UCEvcgGgrC47LwcLgqr01h7A

> in my experience, single-word is hard to train. If there is a way to combine your data into longer utterances, the better. I do have some longer utterances.