dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Add extra Arabic diacritic and TTS models

Open Kentoseth opened this issue 1 year ago • 3 comments

Hi there,

Thanks again for this wonderful project. I think we previously discussed that the format of the models for your app should be .ort. Fortunately for us, we now have more of these for Arabic.

Here are 2 diacritic models in .onnx format:

https://github.com/nipponjo/arabic_vocalizer

(I found some limitations with libtashkeel and opened an issue with the author to clarify: https://github.com/mush42/libtashkeel/issues/2 )

I think the only complicated part here will be in the selection option (not present) of the vocalizer model. Right now it seems to default to the only model available.

And here is the .onnx TTS model:

https://github.com/nipponjo/tts_arabic

I wasn't able to detect the model file in the repo though.

(I don't know if the .onnx format will be an issue, as it is an intermediate model and not the production option)

Kentoseth avatar Jun 04 '24 17:06 Kentoseth

I'm very happy that there is more Arabic support :) I will definitely check out these models.

I think the only complicated part here will be in the selection option (not present) of the vocalizer model. Right now it seems to default to the only model available.

Yes, this is a missing part and have to be implemented.

I don't know if the .onnx format will be an issue

No, it is not an issue. Onnx is used by piper and mimic3, so all needed libraries are already integrated and packed into Flatpak package.

mkiol avatar Jun 05 '24 17:06 mkiol

I've been discussing with libtashkeel author: https://github.com/mush42/libtashkeel/issues/2#issuecomment-2148183492

He informed me that the piper model you are using from piper-phonemize is an MVP model and he has since updated to a better model.

It may be best to drop the MVP model entirely and use the .onnx available here:

https://github.com/mush42/libtashkeel/blob/main/libtashkeel_base/data/ort/model.onnx

To summarize, if you drop the MVP model, then there will be three new diacritics models available and one new Arabic TTS model for the app.

Kentoseth avatar Jun 05 '24 20:06 Kentoseth

Thanks a lot for all the insights!

Indeed, Speech Note currently uses tashkeel re-implemented to C++ version borrowed from Piper project. This version doesn't work with the latest ONNX model. To enable it, I need to integrate the newest libtashkeel. The problem is that libtashkeel uses Rust, so I need to introduce new compiler in my tool-chain. It is a lot of hassle but it is perfectly doable. I will try to do something for the next version (or next after next).

mkiol avatar Jun 07 '24 16:06 mkiol