NeMo
NeMo copied to clipboard
Greedy decoding confidence for CTC and RNNT
What does this PR do ?
Confidence estimation based on the maximum probability and entropy methods for CTC and RNNT models.
Collection: ASR
Changelog
- Hypothesis now has frame_confidence, token_confidence, and word_confidence attributes
Usage
- You can potentially add a usage example below
python scripts/speech_recognition/benchmark_asr_confidence.py pretrained_name=stt_en_conformer_transducer_large_ls dataset_manifest=<librispeech>/test_other.json output_dir=<your_output_dir> 'grid_params="{\"aggregation\": [\"mean\", \"min\"], \"temperature\": [0.33, 0.5]}"'
Before your PR is "Ready for review"
Pre checks:
- [x] Make sure you read and followed Contributor guidelines
- [ ] Did you write any new necessary tests?
- [x] Did you add or update any necessary documentation?
- [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?
PR Type:
- [x] New Feature
- [ ] Bugfix
- [ ] Documentation
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
@vsl9 @hainan-xv @titu1994
This pull request introduces 16 alerts when merging 9cf050d0541f85058f58d950c28c9b61d852629f into efc0c0418644b317b24218680c2ee3f311bc5fe0 - view on LGTM.com
new alerts:
- 11 for Unused import
- 5 for Unused local variable
Unless the script supports every possible combinations of models arch x CTC/RNNT x Char/subword, it should go in the subdirectory of examples/asr rather than at the root of examples/asr
@titu1994 Sorry if I misunderstood you.
My script supports CTC/RNNT x Char/subword and it is in scripts/speech_recognition
.
Do you suggest moving it into examples/asr
?
This pull request introduces 1 alert when merging d7658b57b196742ec6f517963a20e3a4de3983c2 into efc0c0418644b317b24218680c2ee3f311bc5fe0 - view on LGTM.com
new alerts:
- 1 for Unused local variable
It's supposed to be a script rathee than example, I see now. I'd advise to use a subdir inside scripts/speech_recognition anyway since it works to find confidence per word
No.move it to a subdir inside scripts/asr
No.move it to a subdir inside scripts/speech_recognition/confidence
When preserve_alignments = True and compute_timestamps = True is true on a RNNT model, it looks like the timestep in the hypothesis becomes a dict, and self.timestep on rnnt_utils.py:117 has to be changed to self.timestep['timestep'], I had that issue with the stt_zh_conformer_transducer_large model.