TensorFlowASR A question on the language model

A question on the language model

Open wenjingyang opened this issue 4 years ago • 0 comments

Hi, I also have a problem related to this ticket(https://github.com/TensorSpeech/TensorFlowASR/issues/44). I generated a CTC models with deepspeech2. I used a language model and voc download from http://www.openslr.org/11/. (3-gram.arpa is converted to binary) I run the testing and there is no beam_lm output likes below. PATH GROUNDTRUTH GREEDY BEAMSEARCH BEAMSEARCHLM LibriSpeech/test-clean/6930/75918/6930-75918-0000.flac concord returned to its place amidst the tents colmng qhuorid re teurned wo its place afm midst hat tens colmng qhlori re teurned tw its place afm midst hat tens

So I generated a very simple language model based on this cases with kenlm. I can get beam_lm result. But the result looks very wired. LibriSpeech/test-clean/6930/75918/6930-75918-0000.flac concord returned to its place amidst the tents colmng qhuorid re teurned wo its place afm midst hat tens colmng qhlori re teurned tw its place afm midst hat tens kskskshkskskskshhn these "kskskshkskskskshhn" are beam_lm output I refer to https://github.com/usimarit/ctc_decoders/blob/master/example/decode.py My code is below alpha = 2. beta = 1. lm_path=r'./models/lm_sample/sample.bin' vocabulary=r'./models/lm_sample/sample.voc' vocab = load_vocab(vocabulary) vocab[-1] = '_' print(vocab) scorer = Scorer(alpha=alpha, beta=beta, model_path=lm_path, vocabulary=vocab[:-1])

Is there anything I missed? Any help would be appreciate. Thanks.

Jan 28 '21 07:01 wenjingyang

TensorFlowASR TensorFlowASR copied to clipboard

A question on the language model

TensorFlowASR
TensorFlowASR copied to clipboard