Matcha-TTS
Matcha-TTS copied to clipboard
Matcha compared to Vits
I replicated the results of VITS and Matcha-TTS on a single speaker Chinese dataset and found that the timbre similarity of Matcha-TTS is lower than that of VITS, especially in the high-frequency details of the spectrum. Below are the spectrograms of VITS and Matcha-TTS. Is there any way to improve the timbre similarity of Matcha-TTS?