sherpa-onnx
sherpa-onnx copied to clipboard
Incorrect phoneme handling (Kokoro-TTS)
I've noticed that the Hugging Face Kokoro-TTS hosted on Spaces handles phonemes exceptionally well, distinguishing between cases like:
- "read" (past vs. present tense)
- "a project" vs. "to project"
However, the Sherpa-ONNX version does not seem to exhibit the same level of phoneme accuracy. The regular Kokoro-TTS uses Misaki G2P, but I’m unsure how phoneme generation is handled in Sherpa-ONNX or why the results differ.
For reference, I'm implementing this in Flutter and using the following model:
➡️ kokoro-multi-lang-v1_0.tar.bz2
Questions:
- Is there a way to enable Misaki G2P in Sherpa-ONNX?
- If not, what method does Sherpa-ONNX use for phoneme generation?
- Since correct pronunciation depends on context, how can I achieve better phoneme accuracy? The lexicon file alone doesn’t seem sufficient.
- Could you clarify how the gold-silver-bronze ranking system is implemented (if at all) in this model?
Any insights would be greatly appreciated!