dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Feature request: integrate grammar/punctuation models for dictation.

Open alarm10101010 opened this issue 8 months ago • 3 comments

Integration of separate grammar/punctuation models for the speech-to-text recognition would be very good, so users are able to use a low quality / low resource model for dictation of text, but you would get a better quality output using low resources.

alarm10101010 avatar Apr 18 '25 19:04 alarm10101010

Grammar correction is not yet implemented, but punctuation is supported.

If you use low-resource STT models such as DeepSpeech/Coqui or Vosk, you can enable punctuation correction by installing an additional "Punctuation" model. After installing this model, you can enable the "Restore punctuation" option in the settings.

Image Image

BTW, is there any "grammar correction" model that you have tried and can recommend? Someone suggested using LLM for this task, but that would definitely not be "low resource" friendly. I'm looking for something lighter...

mkiol avatar Apr 19 '25 15:04 mkiol

Thanks for the info.

I cant recommend a specific transformer model but I made the experience that the FUTO Board produces very good results with a very small speech to text and a small transformer model.

If I find out something that could help I would open an issue with the info.

alarm10101010 avatar Apr 20 '25 22:04 alarm10101010

If I find out something that could help I would open an issue with the info.

Thanks!

Regarding FUTO, in v4.8.0 Beta, you can download and use FUTO Speech-to-Text voice models ("WhisperCpp FUTO"). These models are compatible with WhisperCpp engine, so they work just out-of-the-box :)

mkiol avatar Apr 21 '25 13:04 mkiol