StyleTTSEngine language in phonemizer is hardcoded to English (en-us)
Hi, I noticed that in StyleTTSEngine, the language used in the phonemizer is hardcoded to English (en-us), which prevents using the engine with models trained for other languages.
Specifically, this line in the code:
# Initialize phonemizer
self.global_phonemizer = phonemizer.backend.EspeakBackend(language='en-us',
preserve_punctuation=True,
with_stress=True)
I have a fine-tuned model that speaks Spanish, but I can't use it properly with StyleTTSEngine because of this limitation. It would be great if the phonemizer language could be configurable or inferred from the model's settings.
Thanks!
I'll take care of this in the next release. Do you by any chance know good resources how to fine tune on another language? Tried for german but failed so far...
@KoljaB
i am using styletts2 with realtimeTTS engine for my native language Hindi which i fine tuned on my own data !! i am currently facing issue with the language_switch
( hi ) nəmˈʌsteː dˈʊnɪjˌaː ( en-us ) , ( hi ) sʈˌaːɪlʈiʈiˈeːs ˈɔɖɪjˌoː kaː pəɾˈiːkʃəɳ kˈɪjaː ɟˈaː ɾˌəhaː hɛː ( en-us )
i tried to remove it in this way
self.global_phonemizer = phonemizer.backend.EspeakBackend(
language='hi',
preserve_punctuation=True,
with_stress=False,
language_switch='remove-flags', # This should remove the flags
words_mismatch='ignore',
# Add these additional settings to suppress warnings
punctuation_marks=';:,.!?¡¿—…"«»""',
strip=True
)
having this language_switch='remove-flags', # This should remove the flags still its synthesizing in this way!!
where i am doing wrong???
also i have created styletts2 tts service class supported by RealtimeTTS engine for pipecat !!
whole logs
2025-05-29 11:36:16.996 | INFO | __main__:start:159 - StyleTTSService#0: TextToAudioStream created successfully
2025-05-29 11:36:16.996 | INFO | __main__:start:166 - StyleTTSService#0: StyleTTS initialization completed successfully in 8.82s
2025-05-29 11:36:16.996 | INFO | __main__:run_tts:226 - StyleTTSService#0: Starting TTS generation for: [नमस्ते दुनिया, स्टाइलटीटीएस ऑडियो का परीक्षण किया ...]
2025-05-29 11:36:16.996 | DEBUG | __main__:run_tts:241 - StyleTTSService#0: Processing text: [नमस्ते दुनिया, स्टाइलटीटीएस ऑडियो का परीक्षण किया जा रहा है]
2025-05-29 11:36:16.996 | DEBUG | __main__:run_tts:249 - StyleTTSService#0: Starting audio streaming...
2025-05-29 11:36:16.996 | DEBUG | __main__:_stream_audio_realtime:269 - StyleTTSService#0: Setting up 200ms buffered audio streaming for text: [नमस्ते दुनिया, स्टाइलटीटीएस ऑड...]
2025-05-29 11:36:16.996 | INFO | __main__:run_synthesis:308 - StyleTTSService#0: Starting synthesis thread for text: [नमस्ते दुनिया, स्टाइलटीटीएस ऑड...]
⚡ synthesizing → 'नमस्ते दुनिया, स्टाइलटीटीएस ऑडियो का परीक्षण किया जा रहा है'
WARNING:phonemizer:2 utterances containing language switches on lines 1, 2
WARNING:phonemizer:extra phones may appear in the "en-us" phoneset
WARNING:phonemizer:language switch flags have been kept (applying "keep-flags" policy)
( hi ) nəmˈʌsteː dˈʊnɪjˌaː ( en-us ) , ( hi ) sʈˌaːɪlʈiʈiˈeːs ˈɔɖɪjˌoː kaː pəɾˈiːkʃəɳ kˈɪjaː ɟˈaː ɾˌəhaː hɛː ( en-us )
New Padding length bert_dur_2: 109
SYNTHESIS FINISHED
I'm sorry I have no time for anything currently. That can take a while until I can look into that.