mpv icon indicating copy to clipboard operation
mpv copied to clipboard

Poor support for locale specific subtitles

Open artjomsR opened this issue 2 years ago • 2 comments

Important Information

Provide following Information:

  • mpv version: nighly mpv v0.36.0-240-g67368ac5, built on Sep 3 2023 00:14:58
  • Platform and Version: Win 10

Reproduction steps

  • Start off with subtitles that match a filename, e.g. 123.mp4 and 123.en.srt. mpv loads these subtitles by default and recognises the language of the subtitles correctly (en).
  • Rename the subtitles to 123.en-US.srt. Open 123.mp4 in mpv

Expected behavior

Treat locale specific subtitles the same way regular subtitles are treated (e.g. as if they are simply .en.vtt and the -US part didn't exist).

  • Recognise said .en-XX.vtt subtitles without the sub-auto=fuzzy flag set
  • Recognise language of the subtitles as en despite the locale

Actual behavior

mpv doesn't open locale specific subtitles by default (sub-auto=fuzzy is required to do so).

  • Additionally, the language of said subs is no longer recognised as en but unknown instead. This in turn causes problems with other subtitle params (e.g. slang=).

Notes

  • These steps can be reproduced with .vtt (and probably other) file extensions
  • This can be reproduced with any subtitles that are of format .xx-yy.srt (or any other file extension beside .srt), where xx is the 2 character language code and yy is the 2 character locale code. As of now, it seems that mpv only cleanly supports xx.srt files.
  • I don't think it's worth the effort to extract the locale (e.g. US) from the filename or anything like that. It would be simply sufficient to recognise that, e.g. en-US subtitles exist and are of English language
  • The reason I'm creating this issue is that sometimes subtitles are provided with locale similar to the above (rather that plain .en.srt ), so recognising them by default saves messing around and renaming them.

artjomsR avatar Sep 10 '23 07:09 artjomsR

mpv's currrent filename parsing doesn't attempt to deal with anything formatted like en-US, but it shouldn't be terribly complicated. It's probably a reasonably common format.

Dudemanguy avatar Sep 10 '23 14:09 Dudemanguy

+1

From trial and error, and looking at guess_lang_from_filename(), etc. in player/external_files.c:

  • --sub-auto=exact seems to match /\.[a-z]{2,3}\.srt$/
  • --sub-auto=fuzzy seems to match /\.[^.]+\.srt$/

Does anyone have filename examples that justify the necessity of these two options as they stand?

Something like /\.[a-z][a-z_-]{0,4}[a-z]\.srt$/i would seem like a good default to replace them both.

forthrin avatar Feb 12 '24 07:02 forthrin