vosk-api icon indicating copy to clipboard operation
vosk-api copied to clipboard

Support for decoding of TDNN-LSTM nnet3?

Open v-yunbin opened this issue 3 years ago • 8 comments

vosk api is defautly support for decoding of TDNN nnet3. I did some experiments, the TDNN-LSTM is obviously better than TDNN and the TDNN-LSTM get 3% gain comparing with TDNN. The TDNN-LSTM needs the parameter as follows: extra_left_context=0 extra_right_context=0 extra_left_context_initial=-1 extra_right_context_final=-1 but i can not find these parameters in src codes.

v-yunbin avatar May 24 '21 05:05 v-yunbin

You can set these parameters in model/conf/model.conf I believe, please try.

nshmyrev avatar May 24 '21 05:05 nshmyrev

@nshmyrev I try it ,get some errors info: Command line was: ERROR (VoskAPI:ReadConfigFile():parse-options.cc:493) Invalid option --extra-left-context 50 in config file model_tdnn_lstm//conf/model.conf terminate called after throwing an instance of 'kaldi::KaldiFatalError' what(): kaldi::KaldiFatalError Aborted (core dumped)

v-yunbin avatar Jun 01 '21 14:06 v-yunbin

--extra-left-context 50

Looks like you forgot =, it should be --extra-left-context=50

nshmyrev avatar Jun 01 '21 18:06 nshmyrev

--extra-left-context 50

Looks like you forgot =, it should be --extra-left-context=50

correct it, i still get same error, maybe this net can not be supported:

Command line was:
ERROR (VoskAPI:ReadConfigFile():parse-options.cc:493) Invalid option --extra-left-context=50 in config file model_tdnn_lstm//conf/model.conf
terminate called after throwing an instance of 'kaldi::KaldiFatalError'
  what():  kaldi::KaldiFatalError
Aborted (core dumped)

v-yunbin avatar Jun 02 '21 07:06 v-yunbin

@nshmyrev how should I modify the src codes,only “--extra-left-context-initial” is supported.

v-yunbin avatar Jun 09 '21 10:06 v-yunbin

it seems that online lstm decoding has not been implemented https://github.com/kaldi-asr/kaldi/issues/1091

v-yunbin avatar Jun 23 '21 08:06 v-yunbin

Hi @nshmyrev I have trained my own model and the performance of the model is great in terms of both accuracy and speech. I found that increasing the beam provides better accuracy but changing the lattice beam from model.conf had no effects.

--min-active=200 --max-active=3000 --beam=12.0 --lattice-beam=5.0 --acoustic-scale=1.0 --frame-subsampling-factor=3 --endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10 --endpoint.rule2.min-trailing-silence=0.5 --endpoint.rule3.min-trailing-silence=0.75 --endpoint.rule4.min-trailing-silence=1.0

This is my model.conf file. What do you think the values for beam and lattice_beam and also other options would provide better accuracy without compromise in performance?

Ramanibharathi avatar Jun 17 '22 07:06 Ramanibharathi

you should open a new issue, it seems that your issue is not related with mine

v-yunbin avatar Jul 21 '22 10:07 v-yunbin