vosk-android-service icon indicating copy to clipboard operation
vosk-android-service copied to clipboard

Incompatibility with various Android keyboards; wrap vosk-android in IME service (especially for standalone use/accessibility)

Open drew-sinha opened this issue 1 year ago • 3 comments

As of 5e02806*, the vosk service is fully functional/compatible with AnySoftKeyboard, but incompatible with OpenBoard and FlorisBoard**. Both of the latter use the inputmethodmanager framework as opposed to interacting with the speechrecognition service, and do not identify the vosk service as an input method.

Given that there is no open-source stt alternative to Google, etc. at time of posting (3/2023), relying on the SpeechRecognitionService is appropriate***. The vosk service works totally fine as a (de-facto) plugin for AnySoft. However, switching to/stuffing an IME service on top of vosk-android as a standalone service would be nice for accessibility (i.e. for those with disabilities due to which it would make sense to use voice as the IME). This isn't unreasonable given that Google already does this with speech services.

Without significant experience with IMM/IMEs, I think that this should be pretty straightforward: add an intermediate level activity on top of the vosk-recognition-service that can be forked off in the manifest as its own service. Then, the given keyboard can decide which service to latch onto for STT.

*additional configuration: Build Configuration: Gradle Toolkit command-line, debug w/universal apk (compilesdk 33) Gradle toolkit version: 7.6 (defaults despite the build kts depending on 7.2.2; no api level spec'd); builds against OpenJDK-14 Device: Pixel 3 Device OS: Android 12 Additional Device Apps: AnySoftKeyboard (v1.11.7137/F-droid; UTD)

**nothing special per se about these two keyboards. I chose them as the major open alternatives I've seen on reddit and f-droid. Of note, I haven't tested konele, but would be surprised if it wasn't compatible given @ccoreilly's efforts with localstt.

***I am hesitant to say that choosing that IMM/IME is better vs direct-speech recognition service, or on any ime designer's preference to use either. Per above, I don't think that either are incompatible per se, and can be construed to have separate use cases. Any thoughts would be appreciated. Tagging some people who may have some useful input: @patrickgold, @dslul, @ewheelerinc, @ildar, @kaljurand, @felicis. CC:@stypox

Edit: misspelled kaljurand, added stypox.

drew-sinha avatar Mar 25 '23 19:03 drew-sinha

I think two general principles make sense:

  • an IME app (especially one that wants to be known as "open") should offer a single-click (or single-swipe) access to the Android speech recognizer (https://developer.android.com/reference/android/speech/SpeechRecognizer) and to the previous and next IMEs (https://developer.android.com/reference/android/inputmethodservice/InputMethodService#switchToNextInputMethod(boolean)), i.e. at least 3 buttons, via which the user can access related services outside of the IME app. (The user could of course be offered a setting to remove these buttons from the UI.) (I've followed this principle more or less in https://github.com/Kaljurand/K6nele)
  • an app should do one thing and do it well, e.g. an app that provides RecognitionService (https://developer.android.com/reference/android/speech/RecognitionService), should not have to provide also the InputMethodService (https://developer.android.com/reference/android/inputmethodservice/InputMethodService).

The second principle is a bit problematic in the current Android, where it is not easy (for the end-user) to install multiple apps at once and use them in combination (regarding locating them in the app store, assigning permissions, etc.). So it unfortunately makes sense to bundle several independent services into a single app.

Kaljurand avatar Mar 25 '23 22:03 Kaljurand

For IME thing there is also https://github.com/ElishaAz/Sayboard/issues/25

nshmyrev avatar Mar 31 '23 00:03 nshmyrev

This feature we're discussing will add much needed functionality for the growing number of people who want to use degoogled phones. For the sake of user experience and adoption I would suggest packaging the IME/Recognitionservice functionality into one app unless the services would impact each other.

In this way, the related FOSS keyboards (FlorisBoard, AnySoft) could point to this one app/project and drive more attention to it.

I'd also say that Sayboard (https://github.com/ElishaAz/Sayboard) may be the correct project to bundle all of the services together along with a voice keyboard.

The maintainer has suggested the app is supposed to be a companion voice IME (https://github.com/ElishaAz/Sayboard/issues/4) so its heading that direction already.

From the user side, it makes sense to download one application that serves as an IME, Recognition Service, and a standalone voice keyboard with an apt name like Sayboard.

Looking forward to seeing the collaboration.

aboveagency avatar Apr 09 '23 22:04 aboveagency