What does this PR do ?

Confidence estimation based on the maximum probability and entropy methods for CTC and RNNT models.

Collection: ASR

Changelog

Hypothesis now has frame_confidence, token_confidence, and word_confidence attributes

Usage

You can potentially add a usage example below

python scripts/speech_recognition/benchmark_asr_confidence.py pretrained_name=stt_en_conformer_transducer_large_ls dataset_manifest=<librispeech>/test_other.json output_dir=<your_output_dir> 'grid_params="{\"aggregation\": [\"mean\", \"min\"], \"temperature\": [0.33, 0.5]}"'

Before your PR is "Ready for review"

Pre checks:

[x] Make sure you read and followed Contributor guidelines
[ ] Did you write any new necessary tests?
[x] Did you add or update any necessary documentation?
[ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[x] New Feature
[ ] Bugfix
[ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

@vsl9 @hainan-xv @titu1994

Sep 14 '22 15:09 GNroy

This pull request introduces 16 alerts when merging 9cf050d0541f85058f58d950c28c9b61d852629f into efc0c0418644b317b24218680c2ee3f311bc5fe0 - view on LGTM.com

new alerts:

11 for Unused import
5 for Unused local variable

Sep 14 '22 16:09 lgtm-com[bot]

Unless the script supports every possible combinations of models arch x CTC/RNNT x Char/subword, it should go in the subdirectory of examples/asr rather than at the root of examples/asr

Sep 14 '22 16:09 titu1994

@titu1994 Sorry if I misunderstood you. My script supports CTC/RNNT x Char/subword and it is in scripts/speech_recognition. Do you suggest moving it into examples/asr?

Sep 14 '22 16:09 GNroy

This pull request introduces 1 alert when merging d7658b57b196742ec6f517963a20e3a4de3983c2 into efc0c0418644b317b24218680c2ee3f311bc5fe0 - view on LGTM.com

new alerts:

1 for Unused local variable

Sep 14 '22 17:09 lgtm-com[bot]

It's supposed to be a script rathee than example, I see now. I'd advise to use a subdir inside scripts/speech_recognition anyway since it works to find confidence per word

Sep 14 '22 19:09 titu1994

No.move it to a subdir inside scripts/asr

Sep 14 '22 20:09 titu1994

No.move it to a subdir inside scripts/speech_recognition/confidence

Sep 14 '22 20:09 titu1994

When preserve_alignments = True and compute_timestamps = True is true on a RNNT model, it looks like the timestep in the hypothesis becomes a dict, and self.timestep on rnnt_utils.py:117 has to be changed to self.timestep['timestep'], I had that issue with the stt_zh_conformer_transducer_large model.

Sep 20 '22 18:09 guillermo-gabrielli-fer

NeMo
NeMo copied to clipboard

Greedy decoding confidence for CTC and RNNT

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

NeMo NeMo copied to clipboard

Greedy decoding confidence for CTC and RNNT

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

NeMo
NeMo copied to clipboard