DeepSpeech icon indicating copy to clipboard operation
DeepSpeech copied to clipboard

Replace KenLM Language Model with TF Language Model

Open kdavis-mozilla opened this issue 8 years ago • 9 comments

kdavis-mozilla avatar Jun 29 '17 18:06 kdavis-mozilla

@kdavis-mozilla can we have our own language model using the existing acoustic model. If so please can you give some glimpse of it.

rajateku6 avatar Jan 18 '18 05:01 rajateku6

@rajateku6 This is suggesting a specific enhancement to the code. If you wish to train another KenLM language model please add your question to the Discourse Forum.

kdavis-mozilla avatar Jan 18 '18 07:01 kdavis-mozilla

I would like to work on making the language model interface more general. Do you think this would be a good approach:

Make Scorer a generic class that subtypes of scorers inherit from (e.g. KenlmScorer). Each subtype of scorer would have to implement get_log_prob() in their own way. This way, someone could write their own scorer using tensorflow much more easily. @ftyers

ksteimel avatar Jul 11 '19 18:07 ksteimel

Thanks for the interest, that'd be a great contribution! The approach sounds good, although I'm not entirely sure get_lob_prob is the only API we'd have to surface from the Scorer. There's also the question of different scorers having different initialization and configuration parameters, and how to expose that in the main DeepSpeech API. But that's a worry for the future, just having an initial abstraction that can be used to experiment with new implementations would be really great to have.

reuben avatar Jul 11 '19 19:07 reuben

Adding this to the 1.0.0 project. This may get implemented there, and it may not dependent upon the LM work done there

kdavis-mozilla avatar Jan 10 '20 10:01 kdavis-mozilla

Was this plan delayed or abandoned?

zaptrem avatar Sep 03 '20 18:09 zaptrem

FWIW think this would be an excellent addition, TF or torch model, maybe leverage transformer library or something similar

rhamnett avatar Sep 03 '20 18:09 rhamnett

FWIW think this would be an excellent addition, TF or torch model, maybe leverage transformer library or something similar

I agree. The current model seems to underperform the competition In the real world while requiring a much larger filesize. In the last two years transformers have revolutionized the LM landscape. Architectures like BERT/DistilBERT/XLM also open interesting opportunities for improving a transcribed word based on context heard afterwards.

zaptrem avatar Sep 03 '20 19:09 zaptrem

It looks as though incorporating transformers as a language model into DeepSpeech has been on people's Wishlist for a while. Does anyone know if any progress has been made or if anyone has managed this? I've looked but can't find anything - not sure if I'm missing something.

tohalb avatar Jul 19 '21 10:07 tohalb