kaldi-gstreamer-server
kaldi-gstreamer-server copied to clipboard
RNN LM Rescoring
Hi, I just wanted to ask, is there any way to rescore the lattice using recurrent neural network language model with the current kaldi-gstreamer setup?
Thanks a lot!
It's theoretically possible using the post-processing framework and n-best lists, but it would be quite complicated.
@jin1004 I'm not sure what RNN library you're using but if it has python bindings you could maybe do this--
- create a new version of the sample_full_post_processor.py file
- import the rescoring method from your rnnlm library (and any other dependencies for it) in that file. In this example we will call the method "rescore_sentence" and assume that it returns a likelihood for the given sentence
- find the line in the post_process_json method that reads "if len(event["result"]["hypotheses"]) > 1:" and insert a line after it
- on that line write this (all on one line): event['result']['hypotheses'] = sorted(event['result']['hypotheses'],key=lambda x : rescore_sentence(x['transcript']),reverse=True)
- save the file
- in your xxx.yaml file replace "sample_full_post_processor.py" with the name of your xxx.yaml
- save that file and make sure your worker is using it
it should now return the hypothesis transcript with the highest likelihood using your rnnlm. That's probably the simplest way to do it, although you would obviously need to do checks to make sure the model is loaded right, the library can be imported, etc. so that it won't break the whole program.
If you are using a library without python bindings (I don't think faster-rnnlm has any for example) then you would probably need to spawn a subprocess or something which would slow things down some and potentially introduce additional complications. If speed isn't an urgent concern then you could do it that way.
I just made this off the cuff so if there are any problems or i misunderstood what you are trying to do let me know.
@calderma Thank you so much! My RNN language model is still in training. I will test it with the method you described and let you know if it works.
@jin1004 Did the proposed solution of @calderma work?
Does anyone have a solution to this?