vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Can the VOSK grammar file be used to exclude words?

Open ideasman42 opened this issue 1 year ago • 4 comments

Based on this issue of NSFW words being included: https://github.com/ideasman42/nerd-dictation/issues/99 it would be useful to know if the VOSK grammar file can be made to exclude words (instead of limiting them).

Is this currently supported? I only found the documentation for the VOSK grammar file in the source header which is not very detailed.

ideasman42 avatar Apr 21 '23 02:04 ideasman42

You can exclude such words in postprocessing step, not need for grammar.

nshmyrev avatar Apr 21 '23 02:04 nshmyrev

Same as https://github.com/alphacep/vosk-api/issues/623 I suppose

nshmyrev avatar Apr 21 '23 02:04 nshmyrev

The problem of excluding words as a post-process is it doesn't account for the model accidentally mistaking words for profanity, where another similar sounding word should be used instead of simply ignoring it.

This is useful outside of handling profanity, there are some words VOSK sometimes think's I'm saying - words I virtually never use (at least not in the context of dictation). So it would be handy to let VOSK know never to select those words.

ideasman42 avatar Apr 21 '23 02:04 ideasman42

@ideasman42 I am using a large negative dictionary for an "update package" from https://alphacephei.com/vosk/lm#update-process With this exclusion dictionary, I reduce en.dic and en-230k-0.5.lm.gz. This adaptation needs some minutes.

svenha avatar Jun 07 '23 12:06 svenha