vosk-api
vosk-api copied to clipboard
Unknown "words" in text.txt with Updating the language model
I want to build a new grammar with a text.txt, all the commands are ok but the last one:
farcompilestrings --fst_type=compact --symbols=words.txt --keep_symbols text.txt |
ngramcount | ngrammake |
fstconvert --fst_type=ngram > Gr.new.fst
- If all the words in text.txt are in the words.txt => OK
- If there are "new words" in the text.txt (unknown words) => there are errors like:
FATAL: FarCompileStrings: Compiling string number 2 in file text.txt failed with token_type = symbol and entry_type = line
I read the -help and use the new command: farcompilestrings --fst_type=compact --symbols=words.txt --unknown_symbol="" --keep_symbols text.txt | ngramcount | ngrammake | fstconvert --fst_type=ngram > Gr.new.fst Another error raised: FATAL: FarCompileStrings: Label "-1" missing from symbol table: words.txt FATAL: STListReader::STListReader: Wrong file type:
I know that: "You can not introduce new words this way, that is something we will cover later.", but Are there any ways to deal with "new words" in a big text? Help me, plz! Thanks in advance!
Are there any ways to deal with "new words" in a big text?
This method can not introduce new words, you have to recompile whole graph (last section).
Thank you! Can I use something like "unk" to replace the new words?
Yes, it is "[unk]" as in the example code.
can you please share me the format of text.txt
Yes, it is "[unk]" as in the example code.
can you please share me the format of text.txt