peoples-speech
peoples-speech copied to clipboard
Speed up DSAlign
Right now, we timeout when an audio file fails to align with its transcript within 200 seconds: https://github.com/mlcommons/peoples-speech/pull/27/files#diff-b790cd27585332e1eeca7dab897f1ccd7bcd483181132bd9914f2dd07062534fR401
This means 10% of our files timeout during alignment.
One observation is that DSAlign seems to slow to a crawl when the groundtruth transcript does not match what was actually said in the audio (e.g., the transcript is a translation)
One option is to reimplement some part of DSAlign in Cython. But we should really dive deep into what's going on, and see if there's something better we can do.