Adding Other language Along with english.
Hey Guys ,
I am using resltimeTTS and it is working superb i am using english language currently .
Can someone help me how can i properly implement Hindi language also so that it can response in both english and hindi languages as per LLM text response.
from TTS.api import TTS
# Load the model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)
# Generate speech by cloning a voice using default settings
tts.tts_to_file(
text="मेरे लिए एक अनूठी आवाज विकसित करने में कई वर्षों का कठिन परिश्रम, गहन अनुसंधान, और निरंतर अभ्यास लगा, जिसके बाद अब जब मैंने अपनी विशिष्ट पहचान पाकर दुनिया के सामने अपनी प्रतिभा को उजागर किया है, तो मैं मौन नहीं रहूंगा और अपनी आवाज़ से हर दिल को छू जाऊंगा।",
file_path="output.wav",
speaker_wav="shak.wav",
language="hi"
)
I used xtts_v2 and it is generating hindi voice properly but it is not realtime that why i want to use RealtimeTTS i think , we can also do that same in RealtimeTTS as it also has option to use coqui engine.
If anyone help regarding this it will be very helpful.
Thanks 😊
Please try this:
"""
1. Create and activate venv:
python -m venv venv
venv\Scripts\activate.bat
2. Install dependencies:
pip install realtimetts[coqui]
3. Update CUDA and install deepspeed for faster processing:
pip install torch==2.1.2+cu121 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
pip install https://github.com/daswer123/deepspeed-windows-wheels/releases/download/11.2/deepspeed-0.11.2+cuda121-cp310-cp310-win_amd64.whl
4. Create a folder named "model" and place the Hindi model files inside it
- for this example you need those files: model.pth, config.json, vocab.json, speakers_xtts.pth and speakers-hi_train_hindifemale_01305.wav
- you can download the files from this Hugging Face model repository: https://huggingface.co/Abhinay45/XTTS-Hindi-finetuned/tree/main
- adjust model files, reference voice and paths as needed
"""
if __name__ == "__main__":
print("Coqui TTS Test")
print("This is a test for the Coqui TTS engine with a local Hindi model.")
print("The model is located in the 'model' folder.")
print("The Hindi voice reference is 'model/speakers-hi_train_hindifemale_01305.wav'.")
print("The output will be saved as 'output.wav'.")
print()
print("Importing necessary modules...")
from RealtimeTTS import TextToAudioStream, CoquiEngine
def dummy_generator():
# Using Hindi sample text for synthesis.
yield "नमस्ते, यह एक परीक्षण संदेश है। "
yield "यहाँ हिंदी में टेक्स्ट टू स्पीच का उपयोग करके आवाज़ बनाई जा रही है।"
import logging
logging.basicConfig(level=logging.DEBUG)
print("Initializing CoquiEngine...")
engine = CoquiEngine(
specific_model="model",
local_models_path=".",
voice="model/speakers-hi_train_hindifemale_01305.wav",
language="hi",
level=logging.DEBUG,
use_deepspeed=True,
)
print("CoquiEngine initialized")
print("Creating TextToAudioStream...")
stream = TextToAudioStream(engine)
print("Starting to play stream")
stream.feed(dummy_generator()).play(log_synthesized_text=True, output_wavfile="output.wav")
print("Playout finished")
engine.shutdown()
Thanks a lot man it worked perfectly!!!!!
Can you help me with one more problem.
I have built an assistant using ollama and RealtimeTTS, and it is currently working exceptionally well with English as the default language. Now, I want to extend its functionality by adding support for Hindi. Specifically, when the response generated by the LLM is in Hindi, the assistant should automatically switch to a Hindi voice model, and when the response is in English, it should continue using the English voice model—without any noticeable delay or drop in performance. I attempted to implement this functionality using a previous code snippet for Hindi support, but I wasn’t successful in integrating both languages into a single, seamless setup. I would really appreciate help in modifying my current RealtimeTTS code so that it can handle both English and Hindi responses dynamically, ensuring smooth and natural voice output in either language.
import os
import time
import torch
import RealtimeTTS
def combined_realtime_text_generator():
"""
Instead of yielding very short segments, this generator accumulates
text for a short duration (e.g., 0.3 seconds) and then yields the combined
text. This helps maintain continuous audio without abrupt gaps.
"""
texts = [
"Hello, this is real-time TTS speaking. ",
"Every sentence is synthesized as soon as it is ready. ",
"The voice is generated using a local, neural cloned model. "
]
combined = ""
for text in texts:
combined += text
time.sleep(0.1) # accumulate text segments (adjust delay as needed)
yield combined
if __name__ == "__main__":
# Check for CUDA support
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
# Create a "voices" folder in the same folder than this script and put a .wav audio file of the voice you want to clone.
# You need a 10 to 30 seconds sample, 44100Hz or 22050Hz mono 32bit float WAV file for best results.
# The first time using a new voice sample, coqui will generate a new files name YOUR_VOICE_SAMPLE_NAME.json in the voices folder.
stream = RealtimeTTS.TextToAudioStream(RealtimeTTS.CoquiEngine(language="en", voice="./voices/[YOUR_VOICE_SAMPLE_NAME.wav]"))
print("Starting realtime TTS streaming...")
# Feed the combined text from our generator to produce continuous speech.
stream.feed(combined_realtime_text_generator()).play(log_synthesized_text=True)
# Wait until playback completes.
while stream.is_playing():
time.sleep(0.05)
print("Playback finished.")
Thanks again😊
Hey can someone help with this issue 🥲
CoquiEngine currently misses switching language at runtime. Adding this code would help (will release soon): in _synthesize_worker:
elif command == "set_language":
language = data["language"]
conn.send(("success", "Language updated successfully"))
as new method:
def set_language(self, language: str):
"""
Sets the language to be used for speech synthesis.
Args:
language (str): New language code to use (e.g., "en", "es", etc.)
"""
self.language = language
self.send_command("set_language", {"language": language})
status, result = self.parent_synthesize_pipe.recv()
if status == "success":
logging.info("Language updated successfully")
else:
logging.error("Error updating language")
return status, result
You probably want to detect the language returned from the LLM somehow, there are python helper libraries to do this (e.g. langdetect).
Then first detect the language and switch CoquiEngine to that language. If you also need to switch the model there is set_model, but switching the model takes some seconds. Only chance to avoid this would be loading 2x CoquiEngine with different models at the same time, but that needs double VRAM.
thanks for answering .
Can you suggest me alternative which support realtime language switch😊
Latest version from earlier today can do this now. Pls do pip install -U realtimetts[coqui], then you can switch language at runtime with set_language method from CoquiEngine.
hey thanks a lot for this update can you please explain to how i can do that and if you have a example code can you provide me which support language switch in runtime it will be great help 😊
engine = CoquiEngine() engine.set_language("de") # german engine.set_language("es") # spanish
Supported languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), Hindi (hi).
so will TTS autodetect the language response or i have to add langdetect
You need to add langdetect or something similar and switch lauguage (and voice reference) depending on the result
So i also need to add two models one for hindi one for english?
hey i tried this but it is showing
Error:
WARNING:root:engine coqui is the only engine available, can't switch to another engine
this error so can you provide me a proper code which is working becoz i am new in these type of stuff 🥲