LibreASR icon indicating copy to clipboard operation
LibreASR copied to clipboard

word timestamp

Open dangvansam opened this issue 3 years ago • 1 comments

can i get words timestamp when predict audio file

dangvansam avatar May 06 '21 08:05 dangvansam

Hey!

You can extract alignment information by adding some code after line 886:

https://github.com/iceychris/LibreASR/blob/f08c8e89a34c6cc3dbf9f2db86f7cc87d84ab003/libreasr/lib/models.py#L853-L892

Save the current encoder timestamp index t (from line 855) together with the current output token pred in a list.

To convert t to seconds, you could use sth like:

# encoder input freq, depends on the model architecture
#  usually 80ms
encoder_freq = 0.08

# rough alignment estimate for an output at encoder output index t
t_seconds = t * encoder_freq

Note that this is just a rough estimate and the actual alignment is usually slightly off when using RNN-T based models like shown in Figure 1 of this paper.

iceychris avatar May 06 '21 11:05 iceychris