Michael Conrad
Michael Conrad
Have been working on TTS stuff and haven't had time to try and figure it out.
I've also tried the following and now I'm getting "RuntimeError: input.size(-1) must be equal to input_size. Expected 80, got 386": ```python mel_npy: array = numpy.load(os.path.join(cwd, "tmp.npy")) mel_npy = mel_npy.reshape((1, mel_npy.shape[0],...
I finally figured out it needed a transpose, but, the generated wav is all silence? ```python mel_npy: array = numpy.load(os.path.join(cwd, "tmp.npy")).transpose() mel_npy = mel_npy.reshape((1, mel_npy.shape[0], mel_npy.shape[1])) mel_tensor: Tensor = torch.tensor(mel_npy).to("cuda")...
The following seems to work. Definitely different sounding... [universalvocoding.zip](https://github.com/bshall/UniversalVocoding/files/7355747/universalvocoding.zip) ```python mel_npy: array = numpy.load(os.path.join(cwd, "tmp.npy")).transpose() top_db = 80 mel_npy = numpy.maximum(mel_npy, -top_db) mel_npy = mel_npy / top_db mel_tensor: Tensor =...
I'm prepping to run another test with a fork of it. I'm looking in https://github.com/CherokeeLanguage/Cherokee-TTS/blob/master/params/params.py and trying to figure out what to change. I see there is a normalize setting....
I suppose this question also applies to the aligner.
I "augmented" my data with the following script: (https://github.com/CherokeeLanguage/cherokee-audio-data/blob/main/create_augmented.py) Adding in the longer combined sequences has greatly enhanced the quality of the output and so far the loss of syllables...
Sorry I haven't responded in a while. Family and health issues. I tracked down part of the "over fitting" to be caused by the aligner. It seems that the aligner...
FYI: Here is a link showing usage of IMS-Toucan to help with language preservation by the creation of teaching materials. https://www.youtube.com/channel/UCEvcgGgrC47LwcLgqr01h7A
> in my experience, single-word is hard to train. If there is a way to combine your data into longer utterances, the better. I do have some longer utterances.