Montreal-Forced-Aligner icon indicating copy to clipboard operation
Montreal-Forced-Aligner copied to clipboard

MFA Validate Inconsistent Output

Open shreeshailgan opened this issue 11 months ago • 2 comments

I am running mfa validate on the LibriTTS-train-clean-460 dataset using an IPA dictionary I have. The output contains:

WARNING  288196total OOV tokens       

However, in the generated oov_counts.txt file that is generated (see snapshot below), the sum of the counts in the 2nd column is 32,905. Shouldn't these two numbers be equal? If not, what does 288,196 represent?

--and	151
phoenix	104
--the	99
--a	88
--i	77
--but	67
ion	65
...

shreeshailgan avatar Mar 19 '24 08:03 shreeshailgan

Are you passing configuration options that remove punctuation symbols? What's the full command you're running and what version are you on?

mmcauliffe avatar Mar 22 '24 01:03 mmcauliffe

MFA version montreal-forced-aligner 3.0.1 pyhd8ed1ab_0 conda-forge

Full command mfa validate /path/to/data/ /path/to/lexicon --ignore_acoustics --num_jobs 48

shreeshailgan avatar Mar 22 '24 06:03 shreeshailgan