Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard
No TextGrid files in output folder, no error message
I've encountered a problem running the MFA where no error is thrown and no TextGrids are written to the Output folder.
I have 9 speakers, with ~20 minutes of speech for each speaker. I have TextGrids with one tier, with relatively short utterances orthographically transcribed in TextGrid format. I have run the mfa_align command using the pretrained english model and the librispeech dictionary. The aligner seems to run fine, and no error is produced, but there are no TextGrid files in the specified output folder.
I have this error with versions 1.1.0 and 1.0.1, and on both Linux and Windows.
Anybody have an idea of what is going on?
got a similar issue when trying the example. No TextGrid file was produced in the output directory. I checked the log file (default path is ~/Documents/MFA/<corpus_name>/logging/corpus.log) and saw something like The following utterances were ignored due to lack of features. It seems to me the binary has trouble getting mfcc features from the audio files.
I also got no output in ../Montreal-Forced-Aligner/examples/alignment
bin/mfa_align ../Montreal-Forced-Aligner/examples/ch data-mandarin/chinese.dict.txt pretrained_models/mandarin.zip ../Montreal-Forced-Aligner/examples/alignment Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 5.0 Creating dictionary information... Setting up corpus_data directory... Generating base features (mfcc)... Calculating CMVN... Done with setup. Done! Everything took 1.4139506816864014 seconds
using 1.0.1 on mac.
I got output (and they are accurate) when I try my own english example, my own Spanish example, and the mandarin example they provided at https://montreal-forced-aligner.readthedocs.io/en/latest/example.html, but no output when I try my own mandarin wav. got
mandarin_wav sample_mandarin_dict.txt pretrained_models/mandarin.zip output
Setting up corpus information...
Number of speakers in corpus: 1, average number of utterances per speaker: 2.0
Creating dictionary information...
Setting up training data...
Calculating MFCCs...
Calculating CMVN...
Number of speakers in corpus: 1, average number of utterances per speaker: 2.0
Done with setup.
100%|█████████████████████████████████████████████| 2/2 [00:01<00:00, 1.07it/s]
Done! Everything took 4.492555856704712 seconds
People has pointed out that this was the result of failed alignment, errors logged in ~/Documents/MFA/XXXXX/tri_ali/log/align.0.0.log
(https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/84)
indeed, compare the log for my succeded Spanish
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/xzfang/Documents/MFA/sample_spanish_wav/tri_ali/0.mdl" - |' ark:/Users/xzfang/Documents/MFA/sample_spanish_wav/tri_ali/fsts.0 ark:/Users/xzfang/Documents/MFA/sample_spanish_wav/train/split1/cmvndeltafeats_fmllr.0 ark:-
gmm-boost-silence --boost=1.0 6 /Users/xzfang/Documents/MFA/sample_spanish_wav/tri_ali/0.mdl -
WARNING (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:103) Wrote model to -
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) Savannah_beso_a_Emilia
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) Savannah_pateo_a_Emilia
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:135) Overall log-likelihood per frame is -109.2 over 551 frames.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:137) Retried 0 out of 2 utterances.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:139) Done 2, errors on 0
and the log for my failed mandarin
gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "/Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/tri_ali/0.mdl" - |' ark:/Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/tri_ali/fsts.0 ark:/Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/train/split1/cmvndeltafeats_fmllr.0 ark:-
gmm-boost-silence --boost=1.0 6 /Users/xzfang/Documents/MFA/sample_mandarin_wav_file_name_no_chinese_char/tri_ali/0.mdl -
WARNING (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.)
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1
LOG (gmm-boost-silence[5.4.251~1-094d2]:main():gmm-boost-silence.cc:103) Wrote model to -
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) 1
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance 1 with beam 40
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file 1, len = 666
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:127) 2
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:466) Retrying utterance 2 with beam 40
WARNING (gmm-align-compiled[5.4.251~1-094d2]:AlignUtteranceWrapper():decoder-wrappers.cc:475) Did not successfully decode file 2, len = 666
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:135) Overall log-likelihood per frame is nan over 0 frames.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:137) Retried 2 out of 2 utterances.
LOG (gmm-align-compiled[5.4.251~1-094d2]:main():gmm-align-compiled.cc:139) Done 0, errors on 2
i hope this is only a problem with mandarin -- I am using p2fa (https://web.sas.upenn.edu/phonetics-lab/facilities/) for both english and mandarine fine.
btw, stereo is not a problem here(https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/107), 1.0.1 can handle stereo, my english wav was stereo.
I had the same issue (no TextGrids output files).
Making sure all the words are in the dictionary fixed it for me (i.e. no prompt to fix words not in the dictionary and an empty oovs_found.txt
file).
Using Windows 10 I had the same issue, No TextGrids output files instead I find an empty file oovs_found.txt file.
This was the result of failed alignment, errors logged in gmm-align-compiled --transition-scale=1.0 --acoustic-scale=0.1 --self-loop-scale=0.1 --beam=10 --retry-beam=40 --careful=false 'gmm-boost-silence --boost=1.0 6 "C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl" - |' 'ark:C:\Users\Brandon/Documents/MFA\data\tri_ali\fsts.0' 'ark:C:\Users\Brandon/Documents/MFA\data\train\split1\cmvndeltafeats_fmllr.0' ark:- gmm-boost-silence --boost=1.0 6 'C:\Users\Brandon/Documents/MFA\data\tri_ali\0.mdl' - WARNING (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:82) The pdfs for the silence phones may be shared by other phones (note: this probably does not matter.) LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:93) Boosted weights for 5 pdfs, by factor of 1 LOG (gmm-boost-silence[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-boost-silence.cc:103) Wrote model to - LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:135) Overall log-likelihood per frame is -nan(ind) over 0 frames. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:137) Retried 0 out of 0 utterances. LOG (gmm-align-compiled[5.4-win]:main():e:\dev\tools\kaldi\src\gmmbin\gmm-align-compiled.cc:139) Done 0, errors on 0
No errors and no done
Try increasing beam
value.
By default it is 10. I had an audio of 30 sec, for that I used beam=100
. If you are using CLI , then add argument mfa align ... --beam 100
.
Apart from that I found that TextGrid are also saved into the temporary directory, like if you are using argument -t
or --temp
then you will find your textgrids in <folder_name>_pretrained_aligner/pretrained_aligner/textgrids
.
Another project relies on this tool. I also encountered a similar problem when using that, and haven't found a solution yet. Who can help me?
Just encountered this problem: log was not showing any problems ("Done XX, errors on 0") but no TextGrid files were appearing in output folder. Increasing beam
didn't work.
Eventually fixed the issue by adding the --clean
flag when running align
. Might be good to point out in the intro that default behavior on validate
and align
is to not overwrite previous runs!
adding --clean and --overwrite worked for me! https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/configuration/index.html
mfa align --clean --overwrite ...
got this idea from the tutorial: https://www.youtube.com/watch?v=phVZijLo9ro