NeMo "greedy_batched" methods should support "partial

"greedy_batched" methods should support "partial_hypotheses" option

Open galv opened this issue 9 months ago • 2 comments

Is your feature request related to a problem? Please describe.

I've been experimenting with examples/asr/asr_cache_aware_streaming/speech_to_text_cache_aware_streaming_infer.py. One of the things I've noticed is that the "greedy_batched" strategy does not support partial hypotheses. We should add support for this. Right now, streaming of RNN-T models is horrendously slow because we are running the decoder at batch size 1, because we must use the "greedy" strategy when doing streaming. The encoder basically isn't meaningfully contributing to the runtime. The decoder is the main slowdown.

FYI @artbataev .

Apr 25 '24 15:04 galv

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

May 26 '24 01:05 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jun 26 '24 01:06 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 03 '24 01:07 github-actions[bot]

NeMo NeMo copied to clipboard

"greedy_batched" methods should support "partial_hypotheses" option

NeMo
NeMo copied to clipboard