Montreal-Forced-Aligner TextGrid format for stereo recording and transcription on separate tiers

TextGrid format for stereo recording and transcription on separate tiers

Open jah238 opened this issue 3 years ago • 4 comments

I understand that TextGrid format supports:

stereo sound files; and
TextGrids with speakers on separate tiers

I have a dataset with both properties.

+-- textgrid_corpus_directory | --- recording1.wav | --- recording1.TextGrid

recording1.wav is a long wav file with 2 channels: speaker A in one channel and speaker B in the other channel. In recording1.TextGrid, speaker A transcriptions are on tier 1 and speaker B transcriptions are on tier 2.

I'm wondering how best to prepare these data.

When I run annotator and try to update an utterance, I get the following error: WARNING - Unable to find utterance A_1213p01fm01am11lt_12_0971_12_5856_channel1 match in [...]1213p01fm01am11lt/1213p01fm01am11lt.TextGrid

Mar 18 '21 21:03 jah238

Ah yeah, so the annotator doesn't support that quite yet. I've done a fair bit of work on it the past few weeks as I've been annotating some data of my own, and maybe it'll be useful for you too? I just need to polish it up a bit and do some optimizations to make sure it stays responsive.

Mar 19 '21 01:03 mmcauliffe

Thanks! I'll watch here for an update.

Mar 23 '21 19:03 jah238

Should be usable now, you can upgrade via pip install montreal-forced-aligner -U. Also note for G2P, you will need to upgrade Pynini via conda upgrade -c conda-forge openfst pynini ngram baumwelch.

I've mostly tested it out with single channel correcting intervals generated from YouTube captions, so the stereo aspects are not as well tested. I would recommend for anything you annotate, set up the tiers with a few intervals for each speaker that correspond to the different channels first, and then it should work ok. If you don't mind the high likelihood of issues, feel free to try it out and let me know what goes wrong and I'll try to fix it up quickly.

Mar 25 '21 01:03 mmcauliffe

Thanks! So I loaded a corpus with the format described above (a .wav/.Texgrid pair).

Save dictionary I successfully saved new entries to the dictionary. After the 2nd try, annotator crashed, although the dictionary update I was attempting was successful. Terminal message: "Aborted (core dumped)"
Save current file This produced a dialog box. The detail was: 'dict' object has no attribute 'split'

Mar 25 '21 13:03 jah238

Montreal-Forced-Aligner Montreal-Forced-Aligner copied to clipboard

TextGrid format for stereo recording and transcription on separate tiers

Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard