RealtimeTTS Adding Other language Along with english.

Hey Guys ,

I am using resltimeTTS and it is working superb i am using english language currently .

Can someone help me how can i properly implement Hindi language also so that it can response in both english and hindi languages as per LLM text response.


from TTS.api import TTS

# Load the model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)

# Generate speech by cloning a voice using default settings
tts.tts_to_file(
    text="मेरे लिए एक अनूठी आवाज विकसित करने में कई वर्षों का कठिन परिश्रम, गहन अनुसंधान, और निरंतर अभ्यास लगा, जिसके बाद अब जब मैंने अपनी विशिष्ट पहचान पाकर दुनिया के सामने अपनी प्रतिभा को उजागर किया है, तो मैं मौन नहीं रहूंगा और अपनी आवाज़ से हर दिल को छू जाऊंगा।",
    file_path="output.wav",
    speaker_wav="shak.wav",
    language="hi"
)

I used xtts_v2 and it is generating hindi voice properly but it is not realtime that why i want to use RealtimeTTS i think , we can also do that same in RealtimeTTS as it also has option to use coqui engine.

If anyone help regarding this it will be very helpful.

Thanks 😊

Apr 05 '25 10:04 Panther465

Please try this:

"""
1. Create and activate venv:
    python -m venv venv
    venv\Scripts\activate.bat
2. Install dependencies:
    pip install realtimetts[coqui]
3. Update CUDA and install deepspeed for faster processing:
    pip install torch==2.1.2+cu121 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
    pip install https://github.com/daswer123/deepspeed-windows-wheels/releases/download/11.2/deepspeed-0.11.2+cuda121-cp310-cp310-win_amd64.whl
4. Create a folder named "model" and place the Hindi model files inside it
    - for this example you need those files: model.pth, config.json, vocab.json, speakers_xtts.pth and speakers-hi_train_hindifemale_01305.wav
    - you can download the files from this Hugging Face model repository: https://huggingface.co/Abhinay45/XTTS-Hindi-finetuned/tree/main 
    - adjust model files, reference voice and paths as needed
"""
if __name__ == "__main__":
    print("Coqui TTS Test")
    print("This is a test for the Coqui TTS engine with a local Hindi model.")
    print("The model is located in the 'model' folder.")
    print("The Hindi voice reference is 'model/speakers-hi_train_hindifemale_01305.wav'.")
    print("The output will be saved as 'output.wav'.")
    print()
    print("Importing necessary modules...")
    from RealtimeTTS import TextToAudioStream, CoquiEngine


    def dummy_generator():
        # Using Hindi sample text for synthesis.
        yield "नमस्ते, यह एक परीक्षण संदेश है। "
        yield "यहाँ हिंदी में टेक्स्ट टू स्पीच का उपयोग करके आवाज़ बनाई जा रही है।"

    import logging
    logging.basicConfig(level=logging.DEBUG)
    
    print("Initializing CoquiEngine...")
    engine = CoquiEngine(
        specific_model="model",
        local_models_path=".",
        voice="model/speakers-hi_train_hindifemale_01305.wav",
        language="hi",
        level=logging.DEBUG,
        use_deepspeed=True,
    )

    print("CoquiEngine initialized")
    print("Creating TextToAudioStream...")
    stream = TextToAudioStream(engine)

    print("Starting to play stream")
    stream.feed(dummy_generator()).play(log_synthesized_text=True, output_wavfile="output.wav")

    print("Playout finished")

    engine.shutdown()

Apr 05 '25 11:04 KoljaB

Thanks a lot man it worked perfectly!!!!!

Can you help me with one more problem.

I have built an assistant using ollama and RealtimeTTS, and it is currently working exceptionally well with English as the default language. Now, I want to extend its functionality by adding support for Hindi. Specifically, when the response generated by the LLM is in Hindi, the assistant should automatically switch to a Hindi voice model, and when the response is in English, it should continue using the English voice model—without any noticeable delay or drop in performance. I attempted to implement this functionality using a previous code snippet for Hindi support, but I wasn’t successful in integrating both languages into a single, seamless setup. I would really appreciate help in modifying my current RealtimeTTS code so that it can handle both English and Hindi responses dynamically, ensuring smooth and natural voice output in either language.

import os
import time
import torch
import RealtimeTTS

def combined_realtime_text_generator():
    """
    Instead of yielding very short segments, this generator accumulates
    text for a short duration (e.g., 0.3 seconds) and then yields the combined
    text. This helps maintain continuous audio without abrupt gaps.
    """
    texts = [
        "Hello, this is real-time TTS speaking. ",
        "Every sentence is synthesized as soon as it is ready. ",
        "The voice is generated using a local, neural cloned model. "
    ]
    combined = ""
    for text in texts:
        combined += text
        time.sleep(0.1)  # accumulate text segments (adjust delay as needed)
    yield combined

if __name__ == "__main__":
    # Check for CUDA support
    device = "cuda" if torch.cuda.is_available() else "cpu"
    print(f"Using device: {device}")



    # Create a "voices" folder in the same folder than this script and put a .wav audio file of the voice you want to clone. 
    # You need a 10 to 30 seconds sample, 44100Hz or 22050Hz mono 32bit float WAV file for best results.
    # The first time using a new voice sample, coqui will generate a new files name YOUR_VOICE_SAMPLE_NAME.json in the voices folder.
    stream = RealtimeTTS.TextToAudioStream(RealtimeTTS.CoquiEngine(language="en", voice="./voices/[YOUR_VOICE_SAMPLE_NAME.wav]"))

    print("Starting realtime TTS streaming...")
    # Feed the combined text from our generator to produce continuous speech.
    stream.feed(combined_realtime_text_generator()).play(log_synthesized_text=True)

    # Wait until playback completes.
    while stream.is_playing():
        time.sleep(0.05)

    print("Playback finished.")

Thanks again😊

Apr 05 '25 14:04 Panther465

Hey can someone help with this issue 🥲

Apr 11 '25 03:04 Panther465

CoquiEngine currently misses switching language at runtime. Adding this code would help (will release soon): in _synthesize_worker:

                elif command == "set_language":
                    language = data["language"]
                    conn.send(("success", "Language updated successfully"))

as new method:

    def set_language(self, language: str):
        """
        Sets the language to be used for speech synthesis.

        Args:
            language (str): New language code to use (e.g., "en", "es", etc.)
        """
        self.language = language
        self.send_command("set_language", {"language": language})
        status, result = self.parent_synthesize_pipe.recv()
        if status == "success":
            logging.info("Language updated successfully")
        else:
            logging.error("Error updating language")
        return status, result

You probably want to detect the language returned from the LLM somehow, there are python helper libraries to do this (e.g. langdetect).

Then first detect the language and switch CoquiEngine to that language. If you also need to switch the model there is set_model, but switching the model takes some seconds. Only chance to avoid this would be loading 2x CoquiEngine with different models at the same time, but that needs double VRAM.

Apr 11 '25 08:04 KoljaB

thanks for answering .

Can you suggest me alternative which support realtime language switch😊

Apr 11 '25 18:04 Panther465

Latest version from earlier today can do this now. Pls do pip install -U realtimetts[coqui], then you can switch language at runtime with set_language method from CoquiEngine.

Apr 11 '25 18:04 KoljaB

hey thanks a lot for this update can you please explain to how i can do that and if you have a example code can you provide me which support language switch in runtime it will be great help 😊

Apr 11 '25 18:04 Panther465

engine = CoquiEngine() engine.set_language("de") # german engine.set_language("es") # spanish

Supported languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), Hindi (hi).

Apr 11 '25 18:04 KoljaB

so will TTS autodetect the language response or i have to add langdetect

Apr 11 '25 19:04 Panther465

You need to add langdetect or something similar and switch lauguage (and voice reference) depending on the result

Apr 11 '25 19:04 KoljaB

So i also need to add two models one for hindi one for english?

Apr 11 '25 19:04 Panther465

hey i tried this but it is showing

Error:
WARNING:root:engine coqui is the only engine available, can't switch to another engine

this error so can you provide me a proper code which is working becoz i am new in these type of stuff 🥲

Apr 11 '25 19:04 Panther465