NeMo Confidence score for each word in asr model

Hi! I am trying nvidia/stt_en_conformer_transducer_xlarge for my ASR system and the performance is impressive. I noticed that there is an option in transcribe function that can support lm_score in output. I wonder if it is possible to generate confidence score for each word or where should I modify the code so that it can show the confidence score for each word. Thanks in advance,

Aug 02 '22 21:08 catalwaysright

We have not yet added support for LM or confidence score in asr model using Hypothesis framework yet, they are placeholders for future support.

Aug 02 '22 21:08 titu1994

Then which part of code should I modify if I want to support this on my own. Could you give me a guide or a direction?

Aug 02 '22 22:08 catalwaysright

It's not straightforward to setup word confidence, especially for RNNT models. I don't have a particular reference at the moment to compute word confidence. Fyi @GNroy

Aug 03 '22 03:08 titu1994

@catalwaysright I'm working on word-level confidence for greedy CTC and RNNT decoding. Could you please describe your confidence use case?

Aug 03 '22 14:08 GNroy

@catalwaysright I'm working on word-level confidence for greedy CTC and RNNT decoding. Could you please describe your confidence use case?

Basically I just want the word-level confidence like you are working so that I can highlight those words that are more likely to be wrong if the confidence is below threshold. It would be even better if the model can generate top 3 confident words for that position.

Aug 03 '22 17:08 catalwaysright

@GNroy How is your implementation right now? Since we can get log prob for CTC, is it possible to get word-level prob though this?

Aug 04 '22 00:08 catalwaysright

@catalwaysright We can certainly use log-probabilities as confidence scores, but this would not be the best approach because the probability distributions of the CTC and RNNT models are biased towards the best hypothesis. In other words, the confidence will be close to one for both correct and incorrect units. I'm working on an approach that alleviates this issue.

Aug 04 '22 18:08 GNroy

@GNroy could you please tell what is the mechanism to calculate word level confidence score?I was following this blog,unfortunately, it seems not convincing.

Aug 29 '22 06:08 shihabtechno

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Oct 06 '22 02:10 github-actions[bot]

@shihabtechno Sorry for the late response. Word level confidence estimation is now in main. In short, one has to calculate per-frame scores and then aggregate them to unit- and word-level scores. Here you can find functions for per-frame confidence estimation and aggregation. This is how to aggregate word scores for a wordpiece-based model.

Oct 06 '22 11:10 GNroy

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Nov 07 '22 02:11 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Nov 15 '22 02:11 github-actions[bot]

NeMo NeMo copied to clipboard

Confidence score for each word in asr model

NeMo
NeMo copied to clipboard