soft-vc
soft-vc copied to clipboard
skipped phonemes in generated audio
hi, thank you for sharing your code.
i am trying to do voice conversion from English speech to Vietnamese speaker. to do that, i did the following steps
- extract units for both English and Vietnamese dataset
- train kmeans on both types of units & extract discrete labels
- train soft encoder
- extract soft units
- train acoustic model
- train hifigan on Vietnamese dataset
the output for Vietnamese speech (input audio is Vietnamese, of a different speaker) is okay. but output for English is not that good. phonemes are often skipped or mispronouced. do you have any suggestions on how i can improve the results?