DeepSpeech icon indicating copy to clipboard operation
DeepSpeech copied to clipboard

Phrase hints in inference calls

Open pvanickova opened this issue 5 years ago • 9 comments

It would be helpful to provide phrase hints (context words) during inference time to boost probability of certain domain specific phrases in the transcription.

E.g. when passing an audio to python api, user could pass a list of likely phrases in the context phrase_hints = ['transverse compound fracture', 'high bp', 'per os'] ds.enableDecoderWithLM(args.alphabet, args.lm, args.trie, LM_ALPHA, LM_BETA, phrase_hints)

pvanickova avatar Jan 07 '19 15:01 pvanickova

Couldn't this be addressed by a custom language model?

kdavis-mozilla avatar Jan 07 '19 15:01 kdavis-mozilla

The context may change dynamically - something that is a context for one inference wouldn't be a context for another one, e.g. different departments are using different terminology, different shops have different inventory, different parts of an app may have different context options, ...

Rebuilding the language model for each case would mean a lot of language models and very frequent update of the models with new phrases.

Plus sufficiently updating a general English language model with just few high probability phrases would require a lot of dummy text generation to assign the phrase enough probability (just guessing about this one).

pvanickova avatar Jan 07 '19 16:01 pvanickova

Good point.

One of the things we are thinking about if the ability to dynamically change language models, see #1678 (Allow use of several decoders (language models) with a single model in the API). Would that be a close enough fit to your use case? (I know you'd still have to create several language models which may be too much of a pain.)

The reason I'm asking is we are trying to decide how to best add just this functionality.

kdavis-mozilla avatar Jan 07 '19 16:01 kdavis-mozilla

I've added my comments for the multiple language model feature in its thread.

Having the option to provide a list of expected phrases for the context still would be very useful in my scenario (pulling subset of hint phrases from a frequently updated dictionary based on the source of the call) .

Once there's a good way to combine probability from multiple language models, this might be implemented as an additional on-the-fly generated mini language model with high probabilities of the injected phrases perhaps?

pvanickova avatar Jan 07 '19 16:01 pvanickova

Thanks!

kdavis-mozilla avatar Jan 07 '19 17:01 kdavis-mozilla

's a good way to combine probability from multiple language models, this might be implemented as an additional on-the-fly generated mini language model with high probabilities of the injected phrases perhaps @pvanickova Have you got the required phrase hints done? I am also in search for the same. Please help me out! Thanks!!!!

axchanda avatar Feb 04 '19 19:02 axchanda

Even with dynamic models, its more accurate to provide context in the form of phrase hints at the time of inference. Because a language model with those phrase hints would apply to each inference, whereas you would rather have certain phrases apply on certain inferences during a session, not all.

SephVelut avatar May 25 '19 10:05 SephVelut

If #432 is completed, people would be able to experiment with ways of handling hints and context assistance more easily (possibly with a view to then including the more broadly applicable successful ones as part of the API)

I like the hints idea but I think it might be valuable to gather together the distinct kinds of scenarios people want to be able to solve. In some cases distinct LMs make sense (switching between them or in combination, eg to extent vocabulary) and in others hints of specific words or potentially classes of word make sense (eg if you expect a number reply it could be handy to bias in favour of numbers whilst still coping with other kinds of response)

nmstoker avatar Sep 04 '19 17:09 nmstoker

I'm also trying to use hinting and substitution methods to rectify errors and improve recognition. I'm using deep speech model only as ASR. I've used deep speech 2 model to build my own pbm and scorer as I'm trying to improvise the ASR for Hindi language. I'm facing issues like while saying "Haa", the model is only catching "a". Need to rectify that, can you please suggest how can I implement 'hints' or 'substitution' for that.

MrityunjoyS avatar Jul 14 '20 04:07 MrityunjoyS