whisperX
whisperX copied to clipboard
word segment found with no start time and end
I was using WhisperX version: v3.1.1 to transcribe an audio and I strangely found after aligning the whisper outputs with phenome based model:
There was a word which was found not having the start time
and end time
. What might be the reason for this issue?
result['word_segments']
{'word': 'Your', 'start': 1302.662, 'end': 1302.842, 'score': 0.866},
{'word': '?'},
{'word': 'R.S.C.?', 'start': 1300.403, 'end': 1300.583, 'score': 0.474},
{'word': 'real', 'start': 1300.122, 'end': 1300.282, 'score': 0.339},
{'word': 'the', 'start': 1300.002, 'end': 1300.082, 'score': 0.892},
{'word': 'to', 'start': 1299.922, 'end': 1299.982, 'score': 0.998},
Limitations ⚠️ Transcript words which do not contain characters in the alignment models dictionary e.g. "2014." or "£13.60" cannot be aligned and therefore are not given a timing.
does this affect the segment start/end as well? so if a segment is "2014 something" then the starting point would be at "something" or at "2014"