vosk-api
vosk-api copied to clipboard
Can the VOSK grammar file be used to exclude words?
Based on this issue of NSFW words being included: https://github.com/ideasman42/nerd-dictation/issues/99 it would be useful to know if the VOSK grammar file can be made to exclude words (instead of limiting them).
Is this currently supported? I only found the documentation for the VOSK grammar file in the source header which is not very detailed.
You can exclude such words in postprocessing step, not need for grammar.
Same as https://github.com/alphacep/vosk-api/issues/623 I suppose
The problem of excluding words as a post-process is it doesn't account for the model accidentally mistaking words for profanity, where another similar sounding word should be used instead of simply ignoring it.
This is useful outside of handling profanity, there are some words VOSK sometimes think's I'm saying - words I virtually never use (at least not in the context of dictation). So it would be handy to let VOSK know never to select those words.
@ideasman42 I am using a large negative dictionary for an "update package" from https://alphacephei.com/vosk/lm#update-process
With this exclusion dictionary, I reduce en.dic
and en-230k-0.5.lm.gz
. This adaptation needs some minutes.