aeneas icon indicating copy to clipboard operation
aeneas copied to clipboard

How do I deal with mixed languages in a sentence/plain text line?

Open ErfolgreichCharismatisch opened this issue 3 years ago • 3 comments

If two languages are combined within one sentence, the algorithm cannot align all following sentences. Ie all following sentences are off sync. I tried task_language=language1,language2 which wasn't accepted, just as task_language=language1|task_language=language2 were not accepted. How do I deal with mixed languages in a sentence/plain text line?

I think you can detect the languages then give it to your aligner

https://www.geeksforgeeks.org/detect-an-unknown-language-using-python/#:~:text=The%20idea%20behind%20language%20detection,various%20modules%20for%20language%20detection.

yasntrk avatar Nov 25 '21 23:11 yasntrk

I suspect that the issue is probably that the default speech synthesizer (espeak) cannot generate proper speech for two languages in a single sentence, so the generated alignment for that sentence is really bad, throwing off the alignment for the rest of the sentences.

I think we'd need to use a "code-switching" speech synthesizer to fix that. A quick web search turned up this one (see the "code-switching" examples at the bottom). Google didn't initially release code for that model, but I wouldn't be surprised if there are now at least a few of open-source projects that have done something similar.

zxul767 avatar Aug 04 '23 05:08 zxul767

Here's what ChatGPT suggests:

As of my last update in September 2021, "code-switching," which involves seamlessly switching between two or more languages within a sentence or conversation, is a challenging task for Text-to-Speech (TTS) systems. However, some TTS tools have been working on supporting multilingual capabilities and handling code-switching to some extent. Here are a few TTS tools that have been exploring code-switching or multilingual support:

  1. Google Text-to-Speech (gTTS): Google's TTS system has been known to handle some level of multilingual text, including code-switching between languages. It uses neural network-based models and can switch between supported languages relatively well.
  1. Mozilla TTS (Tacotron 2): Mozilla's TTS system, also known as Tacotron 2, has been evolving to handle multilingual input. It supports multiple languages, and with appropriate configuration, it may be able to handle code-switching scenarios.
  1. Facebook's wav2vec 2.0 + Hugging Face's TTS: wav2vec 2.0 by Facebook AI Research (FAIR) and Hugging Face's TTS library offer multilingual TTS capabilities. By leveraging the power of wav2vec 2.0's pretrained models, TTS systems can handle multilingual input and code-switching to some extent.
  1. DeepMind's WaveNet and Tacotron: Some researchers have experimented with DeepMind's WaveNet and Tacotron TTS systems to handle multilingual code-switching scenarios. While not native to the models, certain adaptations can be made to support code-switching.

zxul767 avatar Aug 04 '23 06:08 zxul767