sherpa-onnx icon indicating copy to clipboard operation
sherpa-onnx copied to clipboard

ASR TTS Merge into one apk

Open Pantyhose-X opened this issue 6 months ago • 19 comments

I can't even download TTS. https://github.com/k2-fsa/sherpa-onnx/releases --Only ASR ! Where's TTS?

Pantyhose-X avatar Feb 10 '24 03:02 Pantyhose-X

I can't even download TTS. https://github.com/k2-fsa/sherpa-onnx/releases --Only ASR ! Where's TTS?

Please read our README.

csukuangfj avatar Feb 10 '24 04:02 csukuangfj

Please see https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

csukuangfj avatar Feb 10 '24 04:02 csukuangfj

The title is different from the text on this issue, as pointed by others TTS is in a different apk, also different for each language and each voice at the moment. Google have choose to bring TTS and STT/ASR in one apk, since (probably) use similar or same voice database assets. This maybe will be interesting in the future, but depend on the developers view, if merge TTS and STT/ASR will lead to a more complex code for gain about only 70/100MiB of free space... probably the developers should choose the best choice for their perspective. Honestly I will love to have all in one app, but compared to other issue I don't think that is a priority. Consider that STT/ASR have not implemented Recognition Service as I cannot see it on voice input option, and don't support ACTION_RECOGNIZE_SPEECH since I cannot use it from my keyboard.

paolo-caroni avatar Feb 10 '24 10:02 paolo-caroni

@csukuangfj what do you think about this topic? Unify ASR/STT and TTS in one apk would be useful? It seems to me that the C++ code is the same for ASR/STT, voice identification and TTS (28MiB for all CPU architectures), but the onnx model seems different. If you confirm that the models are different and incompatible, this issue should be closed, I think.

paolo-caroni avatar Mar 05 '24 20:03 paolo-caroni

The code is shared but the models are different.

Also, you can install asr apk, tts apk, and speaker identification apk simultaneously on your phone.

csukuangfj avatar Mar 07 '24 02:03 csukuangfj

Also, you can install asr apk, tts apk, and speaker identification apk simultaneously on your phone.

Sure, but in that case there is a minimum of 3 differebt app that will be update to stores #520 (fdroid, google play, xiaomi, huawei, samsung, amazon, ecc.). Only one app is simpler only for that, but will complex code maintain and development. Also, as proposed by @mablue on #569 maybe the icons will be different TTS, STT and identification with similar sherpa logo but still different. I can do that, but I need some confirmations: Licence of the original logo(apache? Creative commons?); Would be you (and other developers) like that idea?

Also if there is 3 different apps, maybe would be a good idea to separate the repository (but still on the official k2-fsa), since in the future will be more issues about an app specific bug and not all sherpa onnx code.

@csukuangfj What do you (and others) think about that?

paolo-caroni avatar Mar 07 '24 05:03 paolo-caroni

Sherpa tts download page is not accesable for blinds. Also them need a sherpa-onnx telegram group to connect directly with developers. Please make it @csukuangfj them requested me to say it to you I think first problem fixed by @jing332 client for sherpa but I cant use it. Its still not work. I dont know why but it have not voice in all phones!! But its very good in ui and managing voices and langs! Its advance but still not like tts server https://github.com/jing332/SherpaOnnxTtsEngineAndroid

Designing Icons All are important for people. And we will have one client. Just one client with multiple voices as tts. And ASR functionality for persian still not available

mablue avatar Mar 08 '24 17:03 mablue

I think first problem fixed by @jing332 client for sherpa but I cant use it. Its still not work. I dont know why but it have not voice in all phones!! But its very good in ui and managing voices and langs! Its advance but still not like tts server https://github.com/jing332/SherpaOnnxTtsEngineAndroid

This is a free software, writed by community of people around the world (and maybe xiaomi if I'm not wrong), you cannot pretend nothing, especially to have fully functional app in zero days.

And ASR functionality for persian still not available

You have opened #559, have you tained a persian model?

paolo-caroni avatar Mar 10 '24 14:03 paolo-caroni

I'm will learn icefall ...I didnt start reading icefall. I haven't started learning icefall yet. But I am interested. Ganjoor site is a good source for Farsi texts and sounds

For example this page https://ganjoor.net/saadi/golestan/gbab1/sh10 Many voices many poems.with time based ui while playing. I'll try with this source to train

mablue avatar Mar 10 '24 15:03 mablue

I'm will learn icefall ...I didnt start reading icefall. I haven't started learning icefall yet. But I am interested. Ganjoor site is a good source for Farsi texts and sounds

@mablue this is totally off-topic, but I think is simpler to you use an already used dataset, such as common voice, that have persian language and is supported by icefall

paolo-caroni avatar Mar 10 '24 21:03 paolo-caroni

@csukuangfj we are going offtopic, but you can respond about the logo question? Since merge all in one apk does not have reason (and so this issue can be closed), what do you think about make different logos fot TTS, ASR/STT and speaker identification?

paolo-caroni avatar Mar 10 '24 21:03 paolo-caroni

what do you think about make different logos fot TTS, ASR/STT and speaker identification

Yes, that sounds good to me. Would you like to contribute?

csukuangfj avatar Mar 11 '24 02:03 csukuangfj

Also, as proposed by @mablue on #569 maybe the icons will be different TTS, STT and identification with similar sherpa logo but still different. I can do that, but I need some confirmations: Licence of the original logo(apache? Creative commons?); Would be you (and other developers) like that idea?

Also if there is 3 different apps, maybe would be a good idea to separate the repository (but still on the official k2-fsa), since in the future will be more issues about an app specific bug and not all sherpa onnx code.

@csukuangfj What do you (and others) think about that?

@csukuangfj probably you have missed the text 4 days ago, yes, I would like to contribute, but please confirm licence of the original logo (since I have to modify it mixing with other image).

paolo-caroni avatar Mar 11 '24 03:03 paolo-caroni

please confirm licence of the original logo

The logo is from us and we are publishing all of our work with Apache 2.0 license.

csukuangfj avatar Mar 11 '24 04:03 csukuangfj

Since merge all in one apk does not have reason

Why? We generating 20 gigabits of repeating binary and etc data in sherpa tts asr and identification apk files. The reson that nobody can do anything to this project is that there is not any all in one small just java-cpp client. And everything is Scattered. Me and many Developers waiting to have just one client and many models to generate what we want with one clinet. If we cant make all in one apk for all tts models it will be a failed project. Cuz no one understands what and where should start the job of updating and etc. I think we should continue to @jing332 project. I love to do someting but internet in iran is a trash 🗑️. Me and many other people cant still get a voice form his/her work (multi lng sherpa ) Im in android 14 and others 12

mablue avatar Mar 13 '24 15:03 mablue

@mablue merge ASR/STT, TTS and speaker identification is TOTALLY different to have all the languages in a TTS apk. Please read carefully this issue

paolo-caroni avatar Mar 13 '24 16:03 paolo-caroni

@mablue merge ASR/STT, TTS and speaker identification is TOTALLY different to have all the languages in a TTS apk. Please read carefully this issue

If we cant merge all just merge tts. Tts can save lifes. Blinds are using it all around the world. Some people in iran have not google tts in persian language because of senctions. And we need merging cuz of having english and maybe other languages near persian lng

mablue avatar Mar 14 '24 15:03 mablue