CaraDuf
CaraDuf
Hi, I finetune Toucan Meta model on 1k on a reduced dataset to understand the difference between Avocodo and BigVGan. Here are the spectrograms :  Apart from the 12kHz...
Hi, I tried to finetune the new Meta model on a 87 sample dataset in French that I used several times already but now the results is very bad. I...
Hi, I want to check what the scorer has to say about my dataset and why it is keeping only 77 samples out of 98 (which all sound ok to...
Hi, In a previous [answer](https://github.com/DigitalPhonetics/IMS-Toucan/issues/109#issuecomment-1475011361) you wrote that you were looking for ways to improve training speed even though you were already satisfied with Toucan's training performance. Have you ever...
Hi, I tried the run_utterance_cloner and noticed very bad results when the transcription text does not match the reference audio. In another project I tried (Coqui) that also does voice...
Hi, Given a target speaker dataset what is roughly the number of fine tuning steps that should be undergone ? [NeMo](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/FastPitch_Finetuning.ipynb) "recommends 1000 steps per minute of audio for fastpitch...
Hi, I merged all my single speaker datasets into a bigger one and finetuned the Meta model on it. Now when inferencing the output sounds like a mixture of all...
Hi, Out of curiosity, I want to test BigVGan. On their [page](https://github.com/NVIDIA/BigVGAN) they say that it accepts `.npy` as input. I browsed the code but could not find where the...
Hi, When using `read_texts` function how long should the `speaker_reference` be and how should it be to give best results ? By "how long" I mean its duration in seconds...
Hi, I am running the training in one remote terminal and I am doing inference of the current model on another one. Sometimes I test the current model and to...