vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Added option to set grammar with custom lexicon

Open mmende opened this issue 1 year ago • 5 comments

This PR adds a new API method vosk_recognizer_set_grm_with_lexicon which allows providing a custom pronunciation lexicon in addition to a grammar. The recognizer uses this lexicon to recreate the HCLr transducer at runtime which allows recognizing words that were not in the lexicon before.

To be able to recreate the HCLr transducer, the model must be a lookahead model and include the context dependency (tree) file and phone symbol table (phones.txt). In some rough, unscientific tests with vosk-model-small-de-0.15, the HCLr recreation for 10 words took ~15ms, 100 words took ~70ms, 500 words took ~430ms, 1000 words took about 1500ms.

There are some hardcoded variables such as the silence phone label (SIL), silence probability , self loop scale and transition scale and grammar fst's are not yet supported.

Furthermore does the method require ~~that the epsilon entry (<eps>) is also in the given lexicon and~~ that phones with positional information must be used correctly (if the model uses such).

PS: Sorry for the whole reformatting stuff (that must have been the Clang-Format extension).

mmende avatar May 19 '23 07:05 mmende

Lovely idea, would love to use it :) But I'm unable to compile from the lexicon branch. Is there something missing ?

g++ -g -O3 -std=c++17 -Wno-deprecated-declarations -fPIC -DFST_NO_DYNAMIC_LINKING -I. -I/opt/kaldi/src -I/opt/kaldi/tools/openfst/include  -I/opt/kaldi/tools/OpenBLAS/install/include -c -o recognizer.o recognizer.cc
In file included from recognizer.h:33,
                 from recognizer.cc:15:
model.h:98:3: error: ‘ContextDependency’ does not name a type
   98 |   ContextDependency *ctx_dep_ = nullptr;
      |   ^~~~~~~~~~~~~~~~~
recognizer.cc: In member function ‘void Recognizer::RebuildLexicon(std::vector<std::__cxx11::basic_string<char> >&, std::vector<std::__cxx11::basic_string<char> >&)’:
recognizer.cc:970:16: error: ‘class Model’ has no member named ‘phone_syms_loaded_’; did you mean ‘word_syms_loaded_’?
  970 |   if (!model_->phone_syms_loaded_ || model_->ctx_dep_ == nullptr) {
      |                ^~~~~~~~~~~~~~~~~~
      |                word_syms_loaded_
recognizer.cc:970:46: error: ‘class Model’ has no member named ‘ctx_dep_’
  970 |   if (!model_->phone_syms_loaded_ || model_->ctx_dep_ == nullptr) {
      |                                              ^~~~~~~~
recognizer.cc:1091:33: error: ‘class Model’ has no member named ‘ctx_dep_’
 1091 |   int32 context_width = model_->ctx_dep_->ContextWidth();
      |                                 ^~~~~~~~
recognizer.cc:1092:36: error: ‘class Model’ has no member named ‘ctx_dep_’
 1092 |   int32 central_position = model_->ctx_dep_->CentralPosition();
      |                                    ^~~~~~~~
recognizer.cc:1102:3: error: ‘HTransducerConfig’ was not declared in this scope
 1102 |   HTransducerConfig h_cfg;
      |   ^~~~~~~~~~~~~~~~~
recognizer.cc:1103:3: error: ‘h_cfg’ was not declared in this scope
 1103 |   h_cfg.transition_scale = transition_scale;
      |   ^~~~~
recognizer.cc:1109:40: error: ‘class Model’ has no member named ‘ctx_dep_’
 1109 |       GetHTransducer(ilabels, *model_->ctx_dep_, *model_->trans_model_, h_cfg,
      |                                        ^~~~~~~~
recognizer.cc:1109:7: error: ‘GetHTransducer’ was not declared in this scope
 1109 |       GetHTransducer(ilabels, *model_->ctx_dep_, *model_->trans_model_, h_cfg,
      |       ^~~~~~~~~~~~~~
recognizer.cc:1131:59: error: no matching function for call to ‘AddSelfLoops(kaldi::TransitionModel&, std::vector<int>&, float&, bool&, bool&, fst::VectorFst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >*)’
 1131 |                reorder, check_no_self_loops, &composed_fst);
      |                                                           ^
In file included from /opt/kaldi/src/fstext/pre-determinize.h:94,
                 from /opt/kaldi/src/fstext/fstext-utils-inl.h:29,
                 from /opt/kaldi/src/fstext/fstext-utils.h:425,
                 from /opt/kaldi/src/fstext/deterministic-fst-inl.h:25,
                 from /opt/kaldi/src/fstext/deterministic-fst.h:333,
                 from /opt/kaldi/src/fstext/grammar-context-fst.h:51,
                 from /opt/kaldi/src/decoder/grammar-fst.h:36,
                 from /opt/kaldi/src/decoder/lattice-faster-decoder.h:26,
                 from recognizer.h:21,
                 from recognizer.cc:15:
/opt/kaldi/src/fstext/pre-determinize-inl.h:599:26: note: candidate: ‘template<class Arc> void fst::AddSelfLoops(fst::MutableFst<Arc>*, std::vector<typename Arc::Label>&, std::vector<typename Arc::Label>&)’
  599 | template<class Arc> void AddSelfLoops(MutableFst<Arc> *fst, std::vector<typename Arc::Label> &isyms,
      |                          ^~~~~~~~~~~~
/opt/kaldi/src/fstext/pre-determinize-inl.h:599:26: note:   template argument deduction/substitution failed:
recognizer.cc:1131:59: note:   mismatched types ‘fst::MutableFst<Arc>*’ and ‘kaldi::TransitionModel’
 1131 |                reorder, check_no_self_loops, &composed_fst);
      |                                                           ^
make: *** [Makefile:112 : recognizer.o] Erreur 1


Shallowmallow avatar Aug 03 '23 13:08 Shallowmallow

I'll try to fix it in the next days.

mmende avatar Aug 03 '23 13:08 mmende

@Shallowmallow you should be able to compile it now. Let me know if it worked or not...

mmende avatar Aug 03 '23 14:08 mmende

Indeed, it compiles. Thanks @mmende !

Shallowmallow avatar Aug 04 '23 07:08 Shallowmallow

If there was a function to activate and deactivate multiple grammars this could allow for context/application specific grammars.

Use case would be multiple programs each with their own grammars. Client-side the foreground window changes the appropriate grammar would be activated in the not relevant loaded grammars would be disabled but loaded in the back-end.

LexiconCode avatar May 03 '24 15:05 LexiconCode