Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard
MFA Validate Inconsistent Output
I am running mfa validate
on the LibriTTS-train-clean-460 dataset using an IPA dictionary I have. The output contains:
WARNING 288196total OOV tokens
However, in the generated oov_counts.txt
file that is generated (see snapshot below), the sum of the counts in the 2nd column is 32,905. Shouldn't these two numbers be equal? If not, what does 288,196 represent?
--and 151
phoenix 104
--the 99
--a 88
--i 77
--but 67
ion 65
...
Are you passing configuration options that remove punctuation symbols? What's the full command you're running and what version are you on?
MFA version
montreal-forced-aligner 3.0.1 pyhd8ed1ab_0 conda-forge
Full command
mfa validate /path/to/data/ /path/to/lexicon --ignore_acoustics --num_jobs 48