Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard
word/phone identification for the alignment
Hi,
I am working on a corpus where I have some additional information for each word I pass to the MFA. The issue is, that the MFA align output is not allowing to track the words I pass with the transcript, and I always end up having to somehow merge the output of the MFA with the input, which contains some additional information for the words (merge on the word, its speaker, its timestamp...). This always makes me loose some of the data and to guess how the word tokenization inside the MFA works.
I would like to have a way to get the output with some sort of index dictionary: speaker_id,word_id from the output that maps to speaker_id,word_id in the input. This will help me a lot (I would be able to apply the alignment without loosing the labels on the data)
Example: word_vec = ["word1","word2"] input: {"xmin":0,"xmax":1,"text":" ".join(word_vec)} output: [ {"xmin":0,"xmax":0.5,"text":"word1","xmin_original":0,"word_num":0}, {"xmin":0.5,"xmax":1,"text":"word2","xmin_original":0,"word_num":1}, ] This way if I have some other information for the transcript in the input I can merge it to the output.