Real-Time-Voice-Cloning
Real-Time-Voice-Cloning copied to clipboard
Quality of generated audio
When using a recording from the LibriSpeech downloaded dataset, a good ratio of the generated audio pieces sound good and accurate. However, whenever I record some audio and use that, no matter who the speaker is, all the generated audio pieces sound the same. Is there any way I can fix this, or am I not understanding how to use this tool correctly? I've seen others on YouTube using the tool the same way I am and the resulting audio clips sound far better than my own,
Provide an example? It may be the quality of your recordings or your microphone setup, like distance from speaker or recording environment.
Use single-speaker fine-turning as described in #437
#437 has a dropbox link which dont exist so kinda hard to reproduce