icefall No duration or confidence info for alignments

No duration or confidence info for alignments

Open AhmedSalah98 opened this issue 1 year ago • 2 comments

I've tried generating alignments for a pruned_transducer_stateless7 model using https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless7/compute_ali.py. Looking at the output cuts I can only find the start times for tokens/words. Is there a way to get the duration and confidence info too?

Sep 02 '24 05:09 AhmedSalah98

Sorry, we can only get the start time of a token for transducer models.

Sep 02 '24 17:09 csukuangfj

Is it safe to assume that the end time for the token is the start time of the next token? It wouldn't be accurate if there's some silence between words unless the aligner can predict blank tokens. Can this be achieved? As for the confidences, I've thought about estimating it from log_probs like here #1092

Sep 03 '24 10:09 AhmedSalah98

I think the time in alignment corresponds to the position in which the non-blank symbol was emitted by the transducer model. The transducer posteriors are all blanks with 1-frame spikes for the non-blank symbols. The token duration cannot be retrieved from the transducer posteriors. And some heuristic needs to be used, possibly also relying on endpointing...

For the project i was experimenting with the transducer confidences, this was integrated into sherpa-onnx. However the best Normalized cross-entropy achieved was only 0.169, which is quite low...

The way to compute confidence was:

not considering the blank posteriors
temperature scale 2.0 of joiner output
the lowest token posterior was playing role of the word score (a proxy)
2 parameter logistic regression to calibrate the word score

Not sure if better NCE results can be achieved through getting acoustic score "under" the CTC alignment. It is possible, but I did not try that... maybe later...

Perhaps this could be integrated with icefall, even without the FSTs : https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html

Apr 08 '25 15:04 KarelVesely84

icefall icefall copied to clipboard

No duration or confidence info for alignments

icefall
icefall copied to clipboard