NeMo
NeMo copied to clipboard
Confidence score for each word in asr model
Hi! I am trying nvidia/stt_en_conformer_transducer_xlarge
for my ASR system and the performance is impressive. I noticed that there is an option in transcribe function that can support lm_score in output. I wonder if it is possible to generate confidence score for each word or where should I modify the code so that it can show the confidence score for each word. Thanks in advance,
We have not yet added support for LM or confidence score in asr model using Hypothesis framework yet, they are placeholders for future support.
Then which part of code should I modify if I want to support this on my own. Could you give me a guide or a direction?
It's not straightforward to setup word confidence, especially for RNNT models. I don't have a particular reference at the moment to compute word confidence. Fyi @GNroy
@catalwaysright I'm working on word-level confidence for greedy CTC and RNNT decoding. Could you please describe your confidence use case?
@catalwaysright I'm working on word-level confidence for greedy CTC and RNNT decoding. Could you please describe your confidence use case?
Basically I just want the word-level confidence like you are working so that I can highlight those words that are more likely to be wrong if the confidence is below threshold. It would be even better if the model can generate top 3 confident words for that position.
@GNroy How is your implementation right now? Since we can get log prob for CTC, is it possible to get word-level prob though this?
@catalwaysright We can certainly use log-probabilities as confidence scores, but this would not be the best approach because the probability distributions of the CTC and RNNT models are biased towards the best hypothesis. In other words, the confidence will be close to one for both correct and incorrect units. I'm working on an approach that alleviates this issue.
@GNroy could you please tell what is the mechanism to calculate word level confidence score?I was following this blog,unfortunately, it seems not convincing.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
@shihabtechno Sorry for the late response. Word level confidence estimation is now in main. In short, one has to calculate per-frame scores and then aggregate them to unit- and word-level scores. Here you can find functions for per-frame confidence estimation and aggregation. This is how to aggregate word scores for a wordpiece-based model.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.