Montreal-Forced-Aligner
Montreal-Forced-Aligner copied to clipboard
TextGrid export step is very slow
Debugging checklist
[X] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there?
[X] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version
?
[X] Have you tried rerunning the command with the --clean
flag?
Describe the issue
I am running mfa align
on a series of datasets containing about 500k utterances using 64 cores in parallel. While doing so, some steps like alignment passes become very slow to the point of not being able to estimate it/s and suddenly increasing by 5000 in one go every 20~30 minutes.
Apart from that, TextGrid exporting is also surprisingly slow, reaching 0~1 it/s. For example, in one of my datasets this step alone has been running for 2 days and 8 hours, and it is estimated to take 9~10 days more! In another dataset of size 1.7M utterances the approximate it/s rounds to zero.
Also, I'm not sure if this helps, but I have noticed similar issues with mfa g2p
. In this case, some runs became completely bottlenecked, down to not being able to estimate it/s and expecting multiple days of work, then sometimes suddenly the it/s would spike to multiple thousands and finish immediately.
Sounds like some kind of bottleneck, possibly related to multiprocessing or database access.
For Reproducing your issue Please fill out the following:
- Corpus structure
- What language is the corpus in? Multiple languages on each dataset. The examples cited above are for Spanish, French, and English in particular, but I also had similar behavior with Japanese.
- How many files/speakers? Spanish and French, about 500k files and 250~300 speakers each. English, about 1.7M files and 900 speakers.
- Are you using lab files or TextGrid files for input? Text files.
- Dictionary
- Are you using a dictionary from MFA? If so, which one? spanish_mfa, french_mfa, english_mfa
- If it's a custom dictionary, what is the phoneset? N/A
- Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? spanish_mfa, french_mfa, english_mfa
- If it's a model you've trained, what data was it trained on? N/A
Desktop (please complete the following information):
- OS: [e.g. Windows, OSX, Linux] Linux
- Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] Ubuntu 20.04
- Any other details about the setup (Cloud, Docker, etc)
Additional context Running in a DGX A100, 128 CPUs (256 cores), 2 TB RAM.
For additional context, all utterances in the examples above should have a duration between 8 and 30 seconds.