[BUG] Certain files error in mfa validate: "utterances that need a larger beam to align"

Open eprzysinda opened this issue 2 years ago • 1 comments

Certain files error in mfa validate: "utterances that need a larger beam to align"

[x ] Have you updated to latest MFA version? (version 2.0.0rc1) [ x] Have you tried rerunning the command with the --clean flag?

Describe the issue Here are the steps I've took:

I ran MFA align on 24 text and wav files and it seemed to run fine with no errors except for 2 of the resulting text grid files did not have any alignments in them (see below for example of textgrid contents)
I tried to see if there was any difference between these text/wav files and the ones that worked but could not see any
I went back and ran mfa validate and received the message that " 2 utterances that need a larger beam to align".
I tried to change the beam via --beam=## trying 10,20,30,40,100,200,1000 and none of them worked
Now I'm not sure what to try next and any input would be greatly appreciated.

For Reproducing your issue Please fill out the following:

Corpus structure
- What language is the corpus in? English
- How many files/speakers? - not differentiating between speakers in my text files, so effectively 1 speaker, overall 22 files worked, 2 didn't
- Are you using lab files or TextGrid files for input? - I'm inputing wav and .txt, wanting textgrid output
Dictionary
- Are you using a dictionary from MFA? If so, which one?, yes, using librispeech-lexicon.txt
- If it's a custom dictionary, what is the phoneset?
Acoustic model - I don't think so?
- If you're using an acoustic model, is it one download through MFA? If so, which one?
- If it's a model you've trained, what data was it trained on?

Log file Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA): align.0.log compile_train_graphs.0.log

*I also will attach an example text file used: clip_E1_3_3.txt

What's in the textgrid file: File type = "ooTextFile" Object class = "TextGrid"

xmin = 0 xmax = 118.04266666666666 tiers? size = 2 item []: item [1]: class = "IntervalTier" name = "words" xmin = 0 xmax = 118.04266666666666 intervals: size = 1 intervals [1]: xmin = 0 xmax = 118.04266666666666 text = "" item [2]: class = "IntervalTier" name = "phones" xmin = 0 xmax = 118.04266666666666 intervals: size = 1 intervals [1]: xmin = 0 xmax = 118.04266666666666 text = ""

Desktop (please complete the following information):

OS: MacOE Monterey, Version 12.0.1
Any other details about the setup: running in terminal on local computer

Additional context Command to align I use: mfa align [input path] [librarypath]/librispeech-lexicon.txt" [output path]

Command to validate: mfa validate [input path] [librarypath]/librispeech-lexicon.txt" [output path] --clean --beam=20

Output from validate: INFO - Setting up corpus information... INFO - Loading corpus from source files... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.49it/s] INFO - Number of speakers in corpus: 1, average number of utterances per speaker: 3.0 INFO - Setting up training data... INFO - Generating base features (mfcc)... INFO - Generating MFCCs... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.48s/it] INFO - Calculating CMVN... INFO - Skipping transcription testing INFO - Finished initializing!

Corpus

3 sound files 3 lab files 0 textgrid files 1 speakers 3 utterances 328.043 seconds total duration

Sound file read errors

There were no issues reading sound files.

Feature generation

There were no utterances missing features.

Files without transcriptions

There were no sound files missing transcriptions.

Transcriptions without sound files

There were 3 transcription files missing sound files. Please see 
  /Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/transcriptions_missing_sound_files.csv for a list.

Text file read errors

There were no issues reading text files.

Dictionary

Out of vocabulary words

7 OOV word types
8 total OOV tokens

For a full list of the word types, please see:

    /Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/oovs_found.txt

For a by-utterance breakdown of missing words, see:

    /Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/utterance_oovs.txt

Acoustic model compatibility

There were no phones in the dictionary without acoustic models.

Alignment

INFO - Compiling training graphs... 100%|██████████████████████████████████████████████████████████████████| 3/3 [00:05<00:00, 1.69s/it] INFO - Generating alignments... 33%|██████████████████████ | 1/3 [00:12<00:25, 12.60s/it] 0 utterances were too short to be aligned 2 utterances that need a larger beam to align There were 2 unaligned utterances out of 3 after initial training. For details, please see:

/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/unalignable_files.csv

1 utterances were successfully aligned 0 utterances were too short to be aligned 2 utterances that need a larger beam to align There were 2 unaligned utterances out of 3 after initial training. For details, please see:

/Users/eprzysinda/Documents/MFA/MFA_test_validate_pretrained/unalignable_files.csv

1 utterances were successfully aligned INFO - Done! Everything took 48.94140601158142 seconds

Apr 13 '22 00:04 eprzysinda

The text file looks like it's pretty long, how long of an audio file is it? Is it possible for you to chunk it into an input TextGrid like: https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/corpus_structure.html#textgrid-format

The logs you attached just had the one file processed in them (MFA-Office-E1-1-1-0-30-168708333333335, looks like it aligned successfully and is 30 seconds long), so I'm not sure if there are other logs or files in there. Also it looks like you're running an older version, so I'd recommend upgrading to the latest and re-running the alignment with --clean as there's been a number of fixes and improvements made.

Apr 15 '22 15:04 mmcauliffe

Montreal-Forced-Aligner Montreal-Forced-Aligner copied to clipboard

[BUG] Certain files error in mfa validate: "utterances that need a larger beam to align"

Sound file read errors

Feature generation

Files without transcriptions

Transcriptions without sound files

Text file read errors

Out of vocabulary words

Acoustic model compatibility

Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard