dsnote Feature request: integrate grammar/punctuation models for dictation.

Integration of separate grammar/punctuation models for the speech-to-text recognition would be very good, so users are able to use a low quality / low resource model for dictation of text, but you would get a better quality output using low resources.

Apr 18 '25 19:04 alarm10101010

Grammar correction is not yet implemented, but punctuation is supported.

If you use low-resource STT models such as DeepSpeech/Coqui or Vosk, you can enable punctuation correction by installing an additional "Punctuation" model. After installing this model, you can enable the "Restore punctuation" option in the settings.

BTW, is there any "grammar correction" model that you have tried and can recommend? Someone suggested using LLM for this task, but that would definitely not be "low resource" friendly. I'm looking for something lighter...

Apr 19 '25 15:04 mkiol

Thanks for the info.

I cant recommend a specific transformer model but I made the experience that the FUTO Board produces very good results with a very small speech to text and a small transformer model.

If I find out something that could help I would open an issue with the info.

Apr 20 '25 22:04 alarm10101010

If I find out something that could help I would open an issue with the info.

Thanks!

Regarding FUTO, in v4.8.0 Beta, you can download and use FUTO Speech-to-Text voice models ("WhisperCpp FUTO"). These models are compatible with WhisperCpp engine, so they work just out-of-the-box :)

Apr 21 '25 13:04 mkiol