LibreASR
LibreASR copied to clipboard
word timestamp
can i get words timestamp when predict audio file
Hey!
You can extract alignment information by adding some code after line 886
:
https://github.com/iceychris/LibreASR/blob/f08c8e89a34c6cc3dbf9f2db86f7cc87d84ab003/libreasr/lib/models.py#L853-L892
Save the current encoder timestamp index t
(from line 855
) together with the current output token pred
in a list.
To convert t
to seconds, you could use sth like:
# encoder input freq, depends on the model architecture
# usually 80ms
encoder_freq = 0.08
# rough alignment estimate for an output at encoder output index t
t_seconds = t * encoder_freq
Note that this is just a rough estimate and the actual alignment is usually slightly
off when using RNN-T
based models like shown in Figure 1 of this paper.