CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

Whisper - Correct way to get prediction probability of each token and timestamp alignment

Open huydang2106 opened this issue 7 months ago • 0 comments

I have spent time looking at the documentation but did not manage to find proper way to get the prediction probabilities of all tokens. Also, how can i get the time-token alignment output from the model, with just the audio input features as input. I did see the align function, but the function requires input features and the input text tokens - which does not seem to meet my need.

huydang2106 avatar Dec 06 '23 10:12 huydang2106