dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

STT language auto-detection

Open rubdos opened this issue 1 year ago • 3 comments

Currently, the STT Dbus API requires mentioning a model based on a language. However, Whisper has a model that does language autodetection.

It would be nice if one could call SttTranscribeFile (and related APIs, I suppose) with string:lang:auto, such that the language gets inferred by the STT. The relevant signals should then return the detected language (but I think it's mostly there already).

Additionally, the relevant user interfaces (I'm on SailfishOS) should get some element that allows downloading this general model.

rubdos avatar Jun 29 '24 16:06 rubdos

Hi, thanks for the idea. Language auto detection can indeed be useful.

I think I can add this in the next version.

mkiol avatar Jun 30 '24 14:06 mkiol

Automatic language detection has been implemented in v4.6.0 ~~(not yet uploaded to OpenRepos)~~.

To use it, you need to download a model from "Auto detected" language category and pass "auto" as a language in Speech Note API.

STT with "Auto detected" models are a bit slower than models with a defined language, so if you know the language, it is still better to use models for a specific language. In v4.6.0 the speed of STT with Whisper is much much quicker, so maybe it is not a problem.

Changes: 5d0b9655525bbeeeee70b134f530b767575f02c4 57e1ee3cf980b09597a8f7bacc600607c260b1c0

mkiol avatar Aug 03 '24 13:08 mkiol

This is beautiful, thank you! <3

Untested patch for Whisperfish: https://gitlab.com/whisperfish/whisperfish/-/merge_requests/613

rubdos avatar Aug 03 '24 15:08 rubdos

Tested, working and released on WF!

rubdos avatar Sep 20 '24 06:09 rubdos

Cool. Glad I could help :)

mkiol avatar Sep 22 '24 12:09 mkiol