alass Troubleshooting wrong alignment

I was wondering how the language-agnostic part works, since on my first few quick tests, it generated a totally wrong output for Dutch subtitles, but a perfect one for English subs. The dutch output had the first 5 subtitles all starting at 00:00:00,000, and then obviously all succeeding subtitles were way too early compared to the audio.

I guess this could still be caused by some other variable than language, since I just tested 2 files. Which makes me wonder; is there a --verbose switch or anything that can help me debug this? How do you recommend approaching this issue?

Really great project btw, and thumbs up on MPV! It is also my main media player on Linux ;)

Oct 10 '19 16:10 davidde

This means that the dutch subtitle is "more different" to the result of the voice-activity-detection than the english subtitle (much more extra/missing lines???). It would be interesting to know which movie you use. Movies with more action scenes and lound background music have a higher chance of failure than "quiet" ones.

If you know that the framerate is correct in the original subtitle file, you could try --disable-fps-guessing. This is usually the step that goes wrong.

In this case you can use a trick by aligning the wrong dutch subtitle to the corrected english subtitle (without any other flags). This has a very high chance of success.

There is no --verbose flag or anything. If the framerate guessing is indeed the step that went wrong, printing the scores for the 7 tested framerate ratios might provide some insight (giving the confidence of the guess). I don't think there is any other usable information for a human.

Oct 10 '19 17:10 kaegi

The information of block of XXX subtitles shifted by XXX gives an impression on how many splits the algorithm does and how far they are placed.

Oct 10 '19 17:10 kaegi

And I forgot: There is a special mode to debug the voice activity detection by using underscore!

alass Movie.mp4 _ voiceactivity.srt

This generates a subtitle containing the timespans of where speech is likely. It is usually not that accurate (given music or background noise), but there should be enough lines that correspond to valid dialog.

Oct 10 '19 18:10 kaegi

Ok, thanks for the pointers. The cases I mentioned were not generated from the same movie. I've now tried synchronizing Dutch subtitles of the movie that generated the perfect English output, and it also generated a bad Dutch output. So that at least seems to suggest it is less reliable for non-English subs, which is weird since voice detection should be no different for non-English.

When I based the Dutch output on English reference subs, output was much better, though not completely flawless. If I can find the time, I might do some more testing with more subs/languages to see if I can narrow the problem down.

Oct 10 '19 22:10 davidde

Another thing you should try is --split-penalty for values like 1,2,5,10,20,30 or 50. It might be that the split penalty is too low (or too high; default is 7 which is rather low). Your case seems very strange, I hope you can find the problem.

Using the voiceactivity.srt is equivalent with using the movie audio. It just skips the extraction step, so you can play around faster with the values.

Oct 10 '19 22:10 kaegi

Great, thanks for the help. I'll see if I can get better matching subs.

Oct 10 '19 22:10 davidde

alass alass copied to clipboard

Troubleshooting wrong alignment

alass
alass copied to clipboard