Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard
[BUG] Certain files error in mfa validate: "utterances that need a larger beam to align"
Certain files error in mfa validate: "utterances that need a larger beam to align"
[x ] Have you updated to latest MFA version? (version 2.0.0rc1)
[ x] Have you tried rerunning the command with the --clean
flag?
Describe the issue Here are the steps I've took:
- I ran MFA align on 24 text and wav files and it seemed to run fine with no errors except for 2 of the resulting text grid files did not have any alignments in them (see below for example of textgrid contents)
- I tried to see if there was any difference between these text/wav files and the ones that worked but could not see any
- I went back and ran mfa validate and received the message that " 2 utterances that need a larger beam to align".
- I tried to change the beam via --beam=## trying 10,20,30,40,100,200,1000 and none of them worked
- Now I'm not sure what to try next and any input would be greatly appreciated.
For Reproducing your issue Please fill out the following:
- Corpus structure
- What language is the corpus in? English
- How many files/speakers? - not differentiating between speakers in my text files, so effectively 1 speaker, overall 22 files worked, 2 didn't
- Are you using lab files or TextGrid files for input? - I'm inputing wav and .txt, wanting textgrid output
- Dictionary
- Are you using a dictionary from MFA? If so, which one?, yes, using librispeech-lexicon.txt
- If it's a custom dictionary, what is the phoneset?
- Acoustic model - I don't think so?
- If you're using an acoustic model, is it one download through MFA? If so, which one?
- If it's a model you've trained, what data was it trained on?
Log file
Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA
):
align.0.log
compile_train_graphs.0.log
*I also will attach an example text file used: clip_E1_3_3.txt
- What's in the textgrid file: File type = "ooTextFile" Object class = "TextGrid"
xmin = 0
xmax = 118.04266666666666
tiers?
Desktop (please complete the following information):
- OS: MacOE Monterey, Version 12.0.1
- Any other details about the setup: running in terminal on local computer
Additional context Command to align I use: mfa align [input path] [librarypath]/librispeech-lexicon.txt" [output path]
Command to validate: mfa validate [input path] [librarypath]/librispeech-lexicon.txt" [output path] --clean --beam=20
Output from validate: INFO - Setting up corpus information... INFO - Loading corpus from source files... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.49it/s] INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 3.0 INFO - Setting up training data... INFO - Generating base features (mfcc)... INFO - Generating MFCCs... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.48s/it] INFO - Calculating CMVN... INFO - Skipping transcription testing INFO - Finished initializing!
Corpus
3 sound files 3 lab files 0 textgrid files 1 speakers 3 utterances 328.043 seconds total duration
Sound file read errors
There were no issues reading sound files.
Feature generation
There were no utterances missing features.
Files without transcriptions
There were no sound files missing transcriptions.
Transcriptions without sound files
There were 3 transcription files missing sound files. Please see
/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/transcriptions_missing_sound_files.csv for a list.
Text file read errors
There were no issues reading text files.
Dictionary
Out of vocabulary words
7 OOV word types
8 total OOV tokens
For a full list of the word types, please see:
/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/oovs_found.txt
For a by-utterance breakdown of missing words, see:
/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/utterance_oovs.txt
Acoustic model compatibility
There were no phones in the dictionary without acoustic models.
Alignment
INFO - Compiling training graphs... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00, 1.69s/it] INFO - Generating alignments... 33%|██████████████████████ | 1/3 [00:12<00:25, 12.60s/it] 0 utterances were too short to be aligned 2 utterances that need a larger beam to align There were 2 unaligned utterances out of 3 after initial training. For details, please see:
/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/unalignable_files.csv
1 utterances were successfully aligned 0 utterances were too short to be aligned 2 utterances that need a larger beam to align There were 2 unaligned utterances out of 3 after initial training. For details, please see:
/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/unalignable_files.csv
1 utterances were successfully aligned INFO - Done! Everything took 48.94140601158142 seconds
The text file looks like it's pretty long, how long of an audio file is it? Is it possible for you to chunk it into an input TextGrid like: https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html#textgrid-format
The logs you attached just had the one file processed in them (MFA-Office-E1-1-1-0-30-168708333333335, looks like it aligned successfully and is 30 seconds long), so I'm not sure if there are other logs or files in there. Also it looks like you're running an older version, so I'd recommend upgrading to the latest and re-running the alignment with --clean
as there's been a number of fixes and improvements made.