mpv
mpv copied to clipboard
[RFC] Add subrandr SRV3 and WebVTT subtitle renderer
This PR adds support for subrandr, a subtitle rendering library I've been working on for, uh, the past 7 months.
The whole point is to render non-ASS subtitle formats correctly, without conversion, because conversion is most of the time lossy. Currently the supported formats are SRV3 which is YouTube's subtitle format and WebVTT.
Results
I have collected a few videos that use more complex SRV3 subtitles while working on subrandr, so I spent some time making three funny dwm four way comparisons between:
- Top left: mpv with subrandr
- Top right: conversion to ass via my ffmpeg decoder. Alternatively one may use YTSubConverter which ~~I believe supports ruby text via manual layout with font metrics at conversion time~~ (I don't know whether this is actually the case?), this approach is obviously fragile with font fallback in the mix so personally I don't consider it a real solution.
- Bottom left: status quo when playing a video from a URL now, which is subtitles in WebVTT format converted on YouTube's side, converted to ASS by ffmpeg and then played using libass
- Bottom right: YouTube web player, ground truth
Comparisons of example videos
【original anime MV】幽霊船戦【hololive/宝鐘マリン】
Hololive music videos often have ruby text and as such are decent testing material. subrandr should handle SRV3 ruby text correctly although it's not implemented for WebVTT yet.
Worst Teambuilding Exercise Ever
This video contains a lot of positioned subtitles with different types of text shadow, and at this particular moment also exercises line-wrapping, which for SRV3 should be greedy.
sodapoppin checks out Northernlion's stream
This one is not that special and the ASS conversion is decently close, but the positioning is of course incorrect because it is fully in the video frame and the font size is similarly just slightly off.
For the sake of completeness, the process I used to create the comparisons
Each quadrant is either an mpv or firefox window, ran under X11 with the dwm window manager in master layout with nmasters = 2, with the windows being constructed as follows:
Top left: Use mpv compiled with subrandr and play a downloaded copy of the video with an accompanying srv3 file. (--sub-format srv3 in ytdl)
Top right: Convert srv3 file to ass file via ffmpeg (ffmpeg -i <in>.srv3 <out>.ass) then play the video with mpv and switch to the ASS track. (requires this fork of ffmpeg)
Bottom left: Play a downloaded copy of the video with an accompanying vtt file downloaded from YouTube.
Bottom right:
- Disable browser decorations via
userChrome.css. - Open the video on YouTube.
- Seek to the appropriate time, put the player in theater mode.
- Run below snippet in the console
(() => {
function full(el, h) {
el.style.position = "fixed"
el.style.zIndex = 10000000;
if(h == "auto") {
el.style.top = "50%"
el.style.left = "50%"
el.style.transform = "translate(-50%, -50%)";
} else {
el.style.top = "0"
el.style.left = "0"
}
el.style.backgroundColor = "black"
}
let player = document.querySelector("#movie_player");
full(player, "100vh");
full(player.querySelector("video"), "auto");
player.querySelector(".ytp-chrome-bottom").style.display = "none";
})()
- Make sure that if the subtitles move if the video is paused (don't remember what this depends on) that the video is playing when the screenshot is taken
I could've probably made separate screenshots in fullscreen mode and then stitched them together inside a markdown table... whatever, this is the first thing I thought of and it probably looks better.
Limitations
Since the library is still in "early" stages there's a lot of things that are not done correctly yet, this is a non-exhaustive list of the most important such things:
- ~~No DirectWrite font provider. (no fonts will be found on Windows)~~
- No CoreText font provider. (no fonts will be found on MacOS)
- ~~Line-breaking is very naive and does unnecessary reshaping, lines are only broken on whitespace instead of following the Unicode line breaking algorithm.~~ Fixed.
- Unicode bidirectional algorithm is not used.
- Any form of vertical text is unsupported.
- ~~Font selection is not compliant with the CSS font matching algorithm, this has been mostly implemented in a branch but is not yet finished.~~ I have since learned that chromium does something pretty close to what I do, so it's staying.
- ~~Subpixel glyph rendering is not implemented so positions are rounded to integer values, this looks wrong on non-HiDPI displays in lower window sizes. I plan to fix this soon.~~
- The rasterizer is not optimized to libass levels, everything is portable (although unsafe) Rust code. The design allows for some GPU acceleration implemented in an optional wgpu rasterizer although mostly only blitting and blurring with actual path rendering being too complicated, and because mpv is designed with software-only OSD rendering in mind it can't be integrated easily. (correct me if I'm wrong)
- ~~There are like two places where it can panic because of unimplemented things but that's easily fixable.~~ Fixed.
The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on MacOS without fontconfig.
mpv integration unresolved issues
- ~~I changed
ytdl-hookto request srv3 subtitles, however this is not gated behind conditional compilation or any sort of runtime check so it breaks in builds without subrandr. Maybe a runtime property could be added that Lua can read?~~ Solved withsubrandr-versionproperty. - subrandr is always passed a
dpiof 72, this shouldn't have impact on subtitle layout with the currently supported formats, but it does impact debug UI when enabled viaSBR_DEBUG=draw_version,draw_perf,draw_layoutand may break in the future if support for CSS in WebVTT is added since one could do::cue { text-size: 20px; }which must be scaled by the device pixel ratio. I have no idea how to get dpi information inget_bitmapswithout digging intompv_globalwhich contains a warning specifically telling you not to do that. - ~~Oh and I almost forgot, currently I forcefully un-align the stride of the resulting
mp_imagewhich could probably cause issues down the line on some platforms, so that should probably be changed.~~ Fixed.
Building
So if you got this far and are on Linux with FreeType, HarfBuzz (with FreeType support), and Fontconfig libraries installed, here's how you build and install the library:
git clone https://github.com/afishhh/subrandr
cd subrandr
cargo xtask install --prefix /<path>/<to>/<prefix>
You also need Rust installed, the latest stable toolchain should work. The prefix should be set to a writable path where the library should be installed, it will create the following filesystem structure there:
include/
subrandr/
<headers>
lib/
pkgconfig/
subrandr.pc
libsubrandr.a
libsubrandr.so
Then you need to make sure <prefix>/lib/pkgconfig is on your pkg-config path when running meson setup (for example via the --pkg-config-path meson arg).
After building the library itself, you should be able to build mpv as usual, by passing -D subrandr=enabled to meson setup you can ensure the library is correctly detected or you will get a build error. ~~The library itself is linked statically, with mpv inheriting the dynamic library dependencies.~~ Static linking is never a good idea somehow.
Alternatives
Is this all necessary? Other possible approaches could be:
- Adding extensions to libass for things like ruby text, doesn't account for format idiosyncrasies like positioning which is especially important with WebVTT where the positioning algorithm is very different (step 10 onward).
- Conversion in ffmpeg, see my patch, it does a decent job but has no chance of supporting stuff like ruby text because loading fonts in a decoder would be very strange and probably never get merged into mainline ffmpeg. Did I mention it's hacky and fragile yet?
- Do nothing, the sad truth is that these formats, even though they're quite powerful, are seldom used to even a fraction of their full potential. YouTube themselves don't support some of SRV3's features on non-Web players (crazy idea would be to add subrandr to revanced, food for thought).
Naturally, after spending months working on this, I am slightly biased and believe a separate renderer is worth because it allows iterating on other subtitle formats without having to worry about the unhinged format known as Advanced Substation Alpha. At first I was developing ASS support in parallel to SRV3 in subrandr, but then realized how much horrible complexity ASS adds and purged it from the code base, this in my eyes confirmed that it is significantly simpler to have other formats handled separately.
Thanks for reading, hope you like my work :)
Excellent job! This works almost perfectly on the few videos I tried. Besides the few limitations you've listed, ~~one other thing I noticed was that alignment is a bit off in the video I tested with at 00:59.~~ Scratch that, subrandr matches Chromium output, it's actually Firefox that's dorky here. Probably sensible to treat Chromium as the ground truth here
You should set PREFIX to a path where $PREFIX/lib/pkgconfig will be on PKG_CONFIG_PATH, $PREFIX/lib will be on the linker library path, and $PREFIX/include will be on the include path. (I believe /usr/local/ works on "usual" Linux distributions but can't test)
This is actually not necessary, you can set PREFIX to any directory then point meson to the pkgconfig file. For example I installed it to PREFIX=/opt/subrandr and configured meson with meson setup build --pkg-config-path /opt/subrandr/lib/pkgconfig. This avoids littering your /usr/local files or needing to change environment variables.
Download the artifacts for this pull request:
Windows
Alternatives -> Nothing
Didn't FFMPEG finally add WebVTT support just 2 days ago?
By "Nothing" I meant that we could "do nothing" not that there's no alternatives, there wouldn't be an alternatives section with alternatives if there weren't alternatives. Looking at ffmpeg-devel I don't see any patches for improving styling support, ffmpeg supported simple webvtt throwing out ~~all~~ most of the styles for a long time already.
Also I fixed the stride hack so now the resulting mp_image has proper aligned stride.
The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".
Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too.
The lack of a font provider means that the library will immediately return an error from sbr_renderer_render as soon as it tries to render text, so it's not currently usable on platforms other than "unix with fontconfig".
Hi I wonder why windows is blocked by this reason. Fontconfig is available on windows too.
Well if that's the case that would make things easier. After downloading many dlls from msys packages I even got it to compile, run, and find the config file in wine but it doesn't find any fonts. I'm hoping that this actually does work on Windows but someone on Windows would have to actually check that. In particular I have no idea what encoding fontconfig returns in FC_FILE on Windows and am currently wishfully assuming it's UTF-8.
Implemented WebVTT snap-to-lines = false layout and Unicode line breaking. This means I'm now slightly less afraid of breaking people's WebVTT rendering.
Is there anything to do on the MPV side before this is ready for review? I was thinking that it may be confusing if people using very customized subtitle options have their customization ignored in WebVTT (because sd_sbr doesn't implement them). Maybe initially we could just use subrandr for SRV3 to not cause regressions with WebVTT if subrandr is enabled, though there is still ytdl_hook that will now start ignoring your stuff by preferring SRV3 but that's less difficult to have configurable at runtime I guess.
Bumping this, I've since improved subrandr's line-height handling and ruby positioning, also am in the process of implementing a mini web engine to match browser styling more precisely (along with improving the layout subsystem on master in the process).
I think this PR is pretty much ready, there's some things that could be done in the future but I don't think there's anything blocking (my main worry is still incompatibility with user ASS styles but that's mostly fixed by --sub-demuxer=lavf existing).
Sorry for the delay, I might have some time this weekend to look at this. On thing that stands out is, how we streamline building and shipping new library. I was thinking of defining dummy meson.build which would consist of custom target that runs cargo build and declare dependency with built library location, so we can use it in mpv meson as subproject. My point is that we should enable it in our CI builds and this probably would be easy to connect to existing pipeline.
On thing that stands out is, how we streamline building and shipping new library. I was thinking of defining dummy
meson.buildwhich would consist of custom target that runs cargo build and declare dependency with built library location, so we can use it in mpv meson as subproject. My point is that we should enable it in our CI builds and this probably would be easy to connect to existing pipeline.
I looked into doing the meson.build stuff, the issue with using declare_dependency() is that it duplicates all the information in the preexisting pkg-config file and the logic in generating it (also such a meson file must re-implement a crude meson target_machine -> rustc target triple translation for cross compilation).
It just seems like a decent amount of work and it would have to be updated whenever something linking-related is changed. I can try but I just don't know whether I want to maintain all that honestly. It would be better if meson could just read that pkg-config file post-build but that's practically the same thing as just building the library as part of the CI job so I wonder whether that would be simpler overall due to not duplicating internal logic.
I added subrandr to the CI builds for Linux, x86_64 mingw64, and x86_64 msys2. Currently i686 and aarch64 Windows targets don't work which I track on my side in https://github.com/afishhh/subrandr/issues/31.
Fixed ARM64 implibs in implib-rs upstream, so aarch64 msys2 can now be tested.
Have you tried making win32 builds work too?
[sbr warn srv3::parse] Unknown element encountered in head: ws [sbr warn srv3::parse] Unknown event attribute ws
I see no way to hide these warnings. please consider to add a way to hide them. thanks
All logging should go through a callback and use mp_log to print them.
All logging should go through a callback and use mp_log to print them.
I had hoped to be able to avoid introducing a C api for logging right now due to unresolved questions about thread safety but I can provide one with the smallest possible set of guarantees to keep all possibilities open.
[sbr warn srv3::parse] Unknown element encountered in head: ws [sbr warn srv3::parse] Unknown event attribute ws
I see no way to hide these warnings. please consider to add a way to hide them. thanks
As a workaround, they can be hidden by setting SBR_LOG=error.
I had hoped to be able to avoid introducing a C api for logging right now due to unresolved questions about thread safety but I can provide one with the smallest possible set of guarantees to keep all possibilities open.
Honestly this is a blocker for me. Libraries that spam stdout/stderr in your application are evil. See how logging is done for ffmpeg https://github.com/mpv-player/mpv/blob/bd2118026b9b46da5633724b34e7849eec80f72c/common/av_log.c
and the callback as mpv is concerned can be called from any thread, thread safety is on our side.
Honestly this is a blocker for me. Libraries that spam stdout/stderr in your application are evil.
I agree and am working on it right now.
See how logging is done for ffmpeg https://github.com/mpv-player/mpv/blob/bd2118026b9b46da5633724b34e7849eec80f72c/common/av_log.c
Unlike libav, logging in subrandr is specifically non-global and done via passing around a reference, has some downsides internally but means this global state is not going to be necessary for mpv.
and the callback as mpv is concerned can be called from any thread, thread safety is on our side.
That's good enough then. The thread safety concerns I was mentioning are more like "do I want to allow modifying the logger concurrently", for now I will restrict setting the logger to immediately post-library-creation.
That's good enough then. The thread safety concerns I was mentioning are more like "do I want to allow modifying the logger concurrently", for now I will restrict setting the logger to immediately post-library-creation.
Ah, I see. You could have params struct passed to sbr_library_init, with some initialization only options, like logging. I guess many solutions, depend how you prefer to handle it.
Alright I added proper logging. Followed ffmpeg in making library warnings MSGL_V so these shouldn't show up by default anymore.
Linux CI should fail now because the package in the image does not contain the logging API yet, and I think we should defer updating it until everything else is ready (so I don't end up with 12 patch releases).
Speaking of "everything being ready", I am wondering whether textsub is the appropriate name for the new demuxer. I initially named it that because it just passed through text unaltered based on the extension, but now it uses subrandr-specific probing functions. Maybe it should be named (demux_) sbr?
Speaking of "everything being ready", I am wondering whether
textsubis the appropriate name for the new demuxer. I initially named it that because it just passed through text unaltered based on the extension, but now it uses subrandr-specific probing functions. Maybe it should be named (demux_)sbr?
Convention is to name demxu_<format>, so sbr would not be accurate here. textsub is indeed bit vague, but I guess it tells what it actually does. But if you has better ideas to organize this feel free.
Unlike libav logging in subrandr is specifically non-global and done via passing around a reference
Alright I added proper logging.
You should be using tracing with tracing_subscriber registry/layer, and should have been from the start.
Edit: From mpv's side, that would take care of logging for any future rust dependency using tracing or log, which is what the overwhelming majority of the rust ecosystem uses.
You should be using
tracingwithtracing_subscriberregistry/layer, and should have been from the start.Edit: From
mpv's side, that would take care of logging for any future rust dependency usingtracingorlog, which is what the overwhelming majority of the rust ecosystem uses.
Appreciate the input, I know what tracing is and I made a conscious decision not to use it.
I do not want to argue here about what you think I should be doing, if you make a C library in Rust feel free to use tracing.
Thanks!
I know what tracing is and I made a conscious decision not to use it.
That's obviously your choice to make indeed.
I do not want to argue here about what you think I should be doing...
That's a very weird thing to reply with. It's as if you don't know what "RFC" stands for.
If you want other projects to depend on your library, you kinda have to deal with comments and suggestions like mine, especially when you are the one pushing for that dependence, don't you think?
I do not want to argue here about what you think I should be doing...
That's a very weird thing to reply with. It's as if you don't know what "RFC" stands for.
If you want other projects to depend on your library, you kinda have to deal with comments and suggestions like mine, especially when you are the one pushing for that dependence, don't you think?
That's correct, which is why I said that I appreciate the input. I replied like this because your message was not worded like a suggestion, you said that "I should have" been doing something seemingly without any consideration for why I wasn't. That sounded like you did not think that "it was my choice to make" but like it was imperative that I use tracing.
Sorry for misunderstanding your intentions, let's not clutter up these comments with this any longer.
Convention is to name demxu_
, so sbr would not be accurate here. textsub is indeed bit vague, but I guess it tells what it actually does. But if you has better ideas to organize this feel free.
Ah, I thought otherwise due to entries such as demux_lavf or demux_libarchive. If that's the case then I don't have any better ideas. textsub does indeed sound vague, but I guess you're right, that is what it actually does.
Convention is to name demxu_, so sbr would not be accurate here. textsub is indeed bit vague, but I guess it tells what it actually does. But if you has better ideas to organize this feel free.
Ah, I thought otherwise due to entries such as
demux_lavfordemux_libarchive. If that's the case then I don't have any better ideas.textsubdoes indeed sound vague, but I guess you're right, that is what it actually does.
Good point. I mean, if demux is used only to interfere with external demuxer it can be named this way. I guess there is no strict rule... I don't mind either way.
Good point. I mean, if demux is used only to interfere with external demuxer it can be named this way. I guess there is no strict rule... I don't mind either way.
Since the demuxer is mostly a workaround for:
- the ffmpeg vtt demuxer is very bare-bones and doesn't support any blocks other than cues
- subrandr doesn't support decoding subtitles packetized in the way the ffmpeg vtt demuxer returns them yet (designing an API for that to which I would want to commit to doesn't sound easy)
- there does not exist an srv3 demuxer in ffmpeg
I think specializing it to subrandr by naming it demux_sbr or something may be more fitting but honestly I'm not sure.
If points 1 and 2 from my list are ever fixed/implemented then theoretically we could clean this up by using ffmpeg's demuxing for vtt and we'd be left with just a simple demux_srv3.
Also I cleaned up some of the library's rough edges around system font handling and glyph caching that I wanted to get done before this is merged. Font selection should now be more consistent and css-fonts-4-compliant across operating systems, it will even poll fontconfig for changes to your config which is cool.
Renamed demux_textsub to demux_sbr, also had to rebase on https://github.com/mpv-player/mpv/commit/0232ff27a80ae0ac31de12989294b02a6ea1cffa to fix merge conflicts so git range-diff aec3232...36270b0 to review (or 2bc5a38 if you don't trust me since github ui apparently ate aec3232).
screenshots of the tests on macOS:
perf data 5k fullscreen with --no-config.
perf data 5k fullscreen with with screenshots at 5k: