NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Greedy decoding confidence for CTC and RNNT

Open GNroy opened this issue 2 years ago • 8 comments

What does this PR do ?

Confidence estimation based on the maximum probability and entropy methods for CTC and RNNT models.

Collection: ASR

Changelog

  • Hypothesis now has frame_confidence, token_confidence, and word_confidence attributes

Usage

  • You can potentially add a usage example below
python scripts/speech_recognition/benchmark_asr_confidence.py pretrained_name=stt_en_conformer_transducer_large_ls dataset_manifest=<librispeech>/test_other.json output_dir=<your_output_dir> 'grid_params="{\"aggregation\": [\"mean\", \"min\"], \"temperature\": [0.33, 0.5]}"'

Before your PR is "Ready for review"

Pre checks:

  • [x] Make sure you read and followed Contributor guidelines
  • [ ] Did you write any new necessary tests?
  • [x] Did you add or update any necessary documentation?
  • [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [x] New Feature
  • [ ] Bugfix
  • [ ] Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

@vsl9 @hainan-xv @titu1994

GNroy avatar Sep 14 '22 15:09 GNroy

This pull request introduces 16 alerts when merging 9cf050d0541f85058f58d950c28c9b61d852629f into efc0c0418644b317b24218680c2ee3f311bc5fe0 - view on LGTM.com

new alerts:

  • 11 for Unused import
  • 5 for Unused local variable

lgtm-com[bot] avatar Sep 14 '22 16:09 lgtm-com[bot]

Unless the script supports every possible combinations of models arch x CTC/RNNT x Char/subword, it should go in the subdirectory of examples/asr rather than at the root of examples/asr

titu1994 avatar Sep 14 '22 16:09 titu1994

@titu1994 Sorry if I misunderstood you. My script supports CTC/RNNT x Char/subword and it is in scripts/speech_recognition. Do you suggest moving it into examples/asr?

GNroy avatar Sep 14 '22 16:09 GNroy

This pull request introduces 1 alert when merging d7658b57b196742ec6f517963a20e3a4de3983c2 into efc0c0418644b317b24218680c2ee3f311bc5fe0 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

lgtm-com[bot] avatar Sep 14 '22 17:09 lgtm-com[bot]

It's supposed to be a script rathee than example, I see now. I'd advise to use a subdir inside scripts/speech_recognition anyway since it works to find confidence per word

titu1994 avatar Sep 14 '22 19:09 titu1994

No.move it to a subdir inside scripts/asr

titu1994 avatar Sep 14 '22 20:09 titu1994

No.move it to a subdir inside scripts/speech_recognition/confidence

titu1994 avatar Sep 14 '22 20:09 titu1994

When preserve_alignments = True and compute_timestamps = True is true on a RNNT model, it looks like the timestep in the hypothesis becomes a dict, and self.timestep on rnnt_utils.py:117 has to be changed to self.timestep['timestep'], I had that issue with the stt_zh_conformer_transducer_large model.

guillermo-gabrielli-fer avatar Sep 20 '22 18:09 guillermo-gabrielli-fer