openspeech icon indicating copy to clipboard operation
openspeech copied to clipboard

Is it possible to get timestamps for BeamSearchLSTM-based inference?

Open OleguerCanal opened this issue 3 years ago • 3 comments

❓ Questions & Help

I see that the output can be aligned (provide per-token timestamps) if we use the CTCBeamDecoder, I wonder if I can get timestamps also if using another decoder such an lstm or transformer-based one.

OleguerCanal avatar Mar 04 '22 15:03 OleguerCanal

Hello, @OleguerCanal

Currently we are not providing it. Also I know that E2E timestamps (including CTC decoder) perform relatively poorly, How was your experience using CTCBeamDecoder?

upskyy avatar Mar 05 '22 14:03 upskyy

Hey @upskyy , no worries, if I can I'll try to implement something and open a PR. I havent had much time to play with the CTCBeamDecoder yet. Will get back to you when I've tested it more

OleguerCanal avatar Mar 14 '22 14:03 OleguerCanal

Do we have alignment for the transducer model implemented?

OleguerCanal avatar Mar 17 '22 11:03 OleguerCanal