kaldi-active-grammar icon indicating copy to clipboard operation
kaldi-active-grammar copied to clipboard

Missing documentation: Import of a custom kaldi model

Open JohnDoe02 opened this issue 3 years ago • 9 comments

What steps are necessary to import a custom kaldi model (trained from scratch, not transfer-learned as in #33) into KAG?

In the readme it is currently stated that:

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

What steps are necessary to kick off the mentioned partial implementation for automatic conversion? What steps remain to be carried out by the user?

JohnDoe02 avatar Oct 21 '20 17:10 JohnDoe02

How much work it takes depends on the model's configuration (perhaps unsurprising since Kaldi is so configurable). If you are performing the training with the intent to use it with KaldiAG, it can be made quite easy. It's been awhile since I converted the Zamia model, so I may be forgetting something, but as I recall...

  • If you do the training starting with the lexicon/phones from one of my models, I think you should be able to just copy in the relevant model files from your trained model, overwriting the ones in my published model, just as you have for the fine tuning.
  • If you are using the same phones, but a different lexicon, it's a bit more complicated but you should be able to massage the words file to fit. However, there are currently a few hard coded constants in KaldiAG for words and phones.
  • For different phones, it is similar but more work. I haven't attempted to do this conversion yet.

FWIW, the unfinished and untested converter is in model.py: see convert_generic_model_to_agf().

daanzu avatar Oct 23 '20 14:10 daanzu

Got it to work! Turns out I only had forgotten to rename splice_opts to splice.conf. Furthermore one also has to add a linebreak in said file as otherwise the parsing will make KAG crash. But that was it.

Just for the record: I used the phone set of your daanzu_20200905 model for my training, but added a number of words to the lexicon. However, I believe this alone has no impact on integration with KAG as long as I don't need those extra words for dictation. They simply live in my user_lexicon.txt

JohnDoe02 avatar Oct 23 '20 16:10 JohnDoe02

So I was too fast. While everything that uses non-dictation commands works like a charm, dictation is broken. I only get garbage, nothing that's in any way related to what I said. Looks indeed as some ids don't fit.

However, I do not really have an idea what's the root cause. This is what I am using for creating the model dir:

cp -r kaldi_model final_model
cp conf/mfcc.conf final_model/conf
cp conf/mfcc_hires.conf final_model/conf
cp conf/online_cmvn.conf final_model/conf
cp exp/nnet3_cleaned/extractor/splice_opts final_model/conf/splice.conf
cp exp/nnet3_cleaned/ivectors_jd_ls_100_clean_sp_hires/conf/ivector_extractor.conf final_model/conf

cp exp/nnet3_cleaned/extractor/final.* final_model/ivector_extractor
cp exp/nnet3_cleaned/extractor/global_cmvn.stats final_model/ivector_extractor

cp exp/chain_cleaned/tdnn_1d_sp/final.mdl final_model/
cp exp/chain_cleaned/tdnn_1d_sp/tree final_model/

JohnDoe02 avatar Oct 23 '20 16:10 JohnDoe02

@JohnDoe02 Ah, I forgot about the dictation FST! You will need to re-compile it using your new .mdl file. Try:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m kaldi_model_dir/G.fst -v

daanzu avatar Oct 24 '20 10:10 daanzu

Just for reference:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m final_model/ -v

did the trick.

JohnDoe02 avatar Oct 24 '20 14:10 JohnDoe02

Could you please provide the general steps to adapt kaldi models (trained for language other than english) ?

widdiot avatar Mar 08 '21 10:03 widdiot

I want to convert any of the following Chinese Mandarin models to compatible with KAG. Thanks for any help or documentation.

  • http://kaldi-asr.org/models/m2
  • http://kaldi-asr.org/models/m10
  • http://kaldi-asr.org/models/m11

I have no experience with Kaldi. Currently the only one environment I can run is from kaldi-dragonfly-winpython37.zip. After getting the available models, I will develop my application with dragonfly.

And I know something about CMUSphinx. I tried Sphinx4 and found that it lacked some features I needed. So I switched to dragonfly/KAG. The English model in kaldi-dragonfly-winpython37.zip perfectly meets my needs, but my program needs to support more languages, especially Chinese.

SwimmingTiger avatar Apr 17 '21 20:04 SwimmingTiger

Similar discussion in #21.

daanzu avatar Apr 19 '21 04:04 daanzu

If it can still be of help/interest to anyone, I have been recently working on importing my own French custom models into KAG. After testing them, I have found them to be well-performing and functional, although I would still need to check some configurations to improve the WER%.

To do this, I first performed an acoustic training (HMM-DNN nnet3 chain models) with Kaldi based on 1000h of French speech. Once it was done, I created a folder to dump my KAG custom model in:

KAG_DIR="kag_model"
mkdir -p ${KAG_DIR}

And I subsequently copied the files coming from my training (as pointed out by @JohnDoe02). In my case:

cp conf/mfcc.conf ${KAG_DIR}/conf
cp conf/mfcc_hires.conf ${KAG_DIR}/conf
cp conf/online_cmvn.conf $AG_DIR}/conf

cp exp/nnet3/extractor/splice_opts ${KAG_DIR}/conf/splice.conf
cp exp/nnet3/ivectors_train_nodup_sp/conf/ivector_extractor.conf ${KAG_DIR}/conf

cp -r exp/nnet3/extractor/final.* ${KAG_DIR}/ivector_extractor/
cp exp/nnet3/extractor/global_cmvn.stats ${KAG_DIR}/ivector_extractor/

cp exp/chain/tdnn_ceos_sp_online/final.mdl ${KAG_DIR}/
cp exp/chain/tdnn_ceos_sp_online/tree ${KAG_DIR}/

Once this was done, I proceeded to compile my language model. To make it work with KAG, I had to deal with the KAG hard coded constants for words and phones. To resolve this, it is necessary to add the list of nonterminals.txt used in KAG (it can be found on any of the available models) to the folder where my pronunciation models are located:

cp nonterminals.txt ${LEXICON_DIR}/dict

I later run the data preparation with Kaldi:

./utils/prepare_lang.sh <dict-src-dir> <oov-dict-entry> <tmp-dir> <lang-dir>

Once this process is finished, we can copy the following files to the folder that will contain our KAG model:

cp ${LANG_DIR}/G.fst ${KAG_DIR}

cp ${LANG_DIR}/words.txt ${KAG_DIR}/words.txt
cp ${LANG_DIR}/words.txt ${KAG_DIR}/words.base.txt # Same as previous file
cp $${LANG_DIR}/words.txt ${KAG_DIR}/words.nonterm.txt # Just including nonterminals

cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}
cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}/align_lexicon.base.int # Same as previous file
cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}/align_lexicon.nonterm.int # Just including nonterminals

cp ${LANG_DIR}/phones/disambig.int ${KAG_DIR}
cp ${LANG_DIR}/phones/left_context_phones.txt ${KAG_DIR}
cp -r ${LANG_DIR}/phones/wdisambig_* ${KAG_DIR}

cp ${LANG_DIR}/phones.txt ${KAG_DIR}
cp ${LANG_DIR}/phones.txt ${KAG_DIR}/phones.nonterm.txt # Just including nonterminals

cp ${LEXICON_DIR}/L_disambig.fst
cp ${LEXICON_DIR}/dict/lexicon.txt ${KAG_DIR}
cp ${LEXICON_DIR}/dict/lexiconp.txt ${KAG_DIR}

cp ${LEXICON_DIR}/tmp/lexiconp_disambig.txt ${KAG_DIR}
cp ${LEXICON_DIR}/tmp/lexiconp_disambig.txt ${KAG_DIR}/lexiconp_disambig.base.txt # Same as previous file

touch user_lexicon.txt # Initially empty

Finally, the dictation graph is compiled with the following command:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m kag_model/ -v

In this way, I managed to create a custom KAG model for French. I hope it can be of any help...

In any case, once convert_generic_model_to_agf() is finished, I am sure the procedure will be much easier.

Lucía

lormaechea avatar Aug 30 '21 12:08 lormaechea