sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

Arabic TTS text preprocessing

Open mush42 opened this issue 1 year ago • 1 comments
trafficstars

Hi

Thanks for your awesome work!

I tried the Arabic TTS voice (Kareem), and I noticed that an important text preprocessing step is missing.

Arabic text is usually unvocalized (aka diacritized). For the purposes of intelligibility the text must be vocalized before phonemization. Usually, a lightweight neural network is used for vocalization. This important preprocessing step is missing from sherpa-onnx.

Piper's Arabic voice has been trained with vocalized text. I say this because I prepared and audited the data used for training that voice.

Fortunately, I'm developing a package for Arabic-text vocalization named Libtashkeel.

It is written in Rust, has a C API, is developed to be cross platform, and the model is embedded in the library itself. Here's the library running on the browser via WASM

The library has a single entry point function that takes a string and outputs a string.

I cann't contribute a PR since I'm not familiar with C++, but I can help to integrate libtashkeel from the rust side via any means necessary.

Best Musharraf

mush42 avatar Apr 28 '24 22:04 mush42

I'd like to add that the tashkeel model shipped with piper-phonemize is not good at all (although I helped to implement it). The library I'm developing works better since it has been trained with lots of data from modern Arabic.

mush42 avatar Apr 28 '24 22:04 mush42