FFmpegInteropX icon indicating copy to clipboard operation
FFmpegInteropX copied to clipboard

Subtitle Configuration: FontFamily (UWP)

Open softworkz opened this issue 1 year ago • 56 comments

Is this working on your side?

Initially, I wanted to get it working with a custom font supplied as appx resource, but then I realized, that it's not even working when setting a system-installed font:

FfmpegMss.Configuration.Subtitles.SubtitleStyle.FontFamily = "Impact";

At some point I got really desperate and tried this:

    for (uint i = 0; i < currentPlaybackItem.TimedMetadataTracks.Count; i++)
    {
        var track = currentPlaybackItem.TimedMetadataTracks[(int)i];

        foreach (var mediaCue in track.Cues)
        {
            var cue = (TimedTextCue)mediaCue;
            cue.CueStyle.FontFamily = "Impact";

            foreach (var textLine in cue.Lines)
            {
                foreach (var subformat in textLine.Subformats)
                {
                    subformat.SubformatStyle.FontFamily = "Impact";
                }
            }
        }
    }

But no chance to change the font. Everything else is working: Font size, font bold, foreground color, etc.

Is it working on your side?

softworkz avatar Nov 02 '23 22:11 softworkz

Internally we use TimedMetadataTrack and its related APIs to handle subtitles. There are multiple reasons we do so, you can dig up the PR for subs support from 5 years ago.

This means that winRT has some degree of authority over how subtitles are presented. I alway wondered how this font thing would work taking windows settings into account. It may look like your windows settings are overwriting the custom settings.

I remember this used to work at some point, I will look into it again.

brabebhin avatar Nov 04 '23 19:11 brabebhin

you can dig up the PR for subs support from 5 years ago.

I remember this used to work at some point,

I had already looked at the code, mainly to find out how you are setting custom fonts (which are not installed on the system), and yes, that made me think the same: It must have worked at some time.

To clarify: I don't mean to say that it's a problem of FFmpegInteropX. I get the same results when adding the subtitle track via Windows.Media methods directly.

So, what I'm rather wondering: is it only me where it's not working (maybe due to some system configuration or certain UWP project/appx settings) - or is it a general problem (maybe with recent SDKs)?

softworkz avatar Nov 04 '23 19:11 softworkz

I haven't seen major differences in the way APIs behave between SDK versions. I think this is more of a Windows problem. It may also be that the font family property is meant for the use case in which the application renders the subtitles, not when it's done thought the MPE. It was working at some point as in actually a bug in Windows that got fixed.

I would love to have the subtitles rendered fully within ffmpeginteropx, but we are missing libass build for that. We could get rid completely of MPE after that by rendering things in frame server mode, which provides more flexibility and performance.

brabebhin avatar Nov 04 '23 20:11 brabebhin

I think this is more of a Windows problem.

Yup, very well possible.

It may also be that the font family property is meant for the use case in which the application renders the subtitles, not when it's done thought the MPE.

But all other text style options are working properly..

Can you confirm that it's the same on your side, i.e. that you can set the font to Bold but it ignores the FontFamily setting?

softworkz avatar Nov 04 '23 20:11 softworkz

I would love to have the subtitles rendered fully within ffmpeginteropx, but we are missing libass build for that. We could get rid completely of MPE after that by rendering things in frame server mode, which provides more flexibility and performance.

How exactly would you want to do that? For proper libass rendering, you need to generate overlay frames at the same fps (or at least half) and I don't think this would work using ImageCue items. Blending the frames together is a too expensive operation I suppose, so I would see only the following two ways:

  1. Let libass operate on the D3D surfaces directly, so it paints the subs right onto the video frames
  2. Let libass paint on transparent frames, convert and upload them as D3D surfaces and then...? I'm unsure how the next part is working, haven't looked it up: I thought you would supply the D3D surfaces to the MediaPlayer as video frames - but could you supply two D3D surfaces where one is overlaid on the other?

softworkz avatar Nov 04 '23 20:11 softworkz

One of the kinks of this project is that we don't really have it all figured out. Until we link libass and see what it does, we don't really know how it will go.

brabebhin avatar Nov 04 '23 21:11 brabebhin

I know what libass does. But I don't know how you are supplying video frames to MediaPlayer :-)

softworkz avatar Nov 04 '23 22:11 softworkz

There's multiple ways, depends on the video decoder mode.

We have directx based decoding, and we can acces the texture containing video data.

There's the pass through, which is basically the system doing the decoding - total black box.

Software decoding inside ffmpeg is pretty self explanatory.

We could render subtitles to a directx image and expose the subtitle streams to MediaPlaybackItem the same way we do for audio and video, and we would feed back raw images.

Or we could overlay the subtitles using a ffmpeg filter. Multiple ways really, but until we have libass linked in and get a gist of it, it's pretty difficult to say which way to go.

This will also entitle a complete rewrite of the subtitles engine, which may or may not be productive in the long run.

brabebhin avatar Nov 04 '23 22:11 brabebhin

My subtitles PR for ffmpeg (https://github.com/ffstaging/FFmpeg/pull/18) includes a new text2graphicsub filter which outputs the rendered subtitles as video frames. We use it for subtitle-burn-in at the server side. You can use the output with the ffmpeg overlay filter to do software overlay, or you can also use the hwupload filter and do hardware overlay. But there's nothing like an "overlay_d3d" filter, only those specific to a certain hw context, like overlay_cuda and overlay_qsv with which it is working fine, but it requires specific handling and filter setup depending on the hw context.

What I meant above (re: two d3d surfaces) would be avoiding any burn-in processing and have the overlay done "on-screen" by showing the two surfaces, with the subtitles-layer on top.

softworkz avatar Nov 04 '23 22:11 softworkz

The advantage of d3d is that it fully abstracts hardware. We prefer that over closed techs like cuda.

The elephant in the room is the pass through decoder, which is still needed for xbox and doesn't allow any kind of frame manipulation.

brabebhin avatar Nov 05 '23 06:11 brabebhin

The advantage of d3d is that it fully abstracts hardware. We prefer that over closed techs like cuda.

Both are closed (in the sense of not being open-source), but CUDA and Intel frameworks are cross-platform, Intel partially open-source. But the primary point is that ffmpeg doesn't provide D3D11 filters but there are many hw filters for Nvidia and Intel, which makes implementing something in that area pretty convenient.

But a pure D3D based implementation is still a serious option to consider of course.

The elephant in the room is the pass through decoder, which is still needed for xbox and doesn't allow any kind of frame manipulation.

So it's not possible to work with D3D surfaces on xbox?

It still does hw accelerated playback, though, no?

softworkz avatar Nov 05 '23 07:11 softworkz

Writing code that supports only a subset of available hardware, no matter how big that is, is not convenient. This is why we are avoiding CUDA and Intel frameworks.

I don't know how xbox does it, never debugged it, some people reported that only pass through works on xbox with acceptable performance. I suppose there's some differences in how the d3d pointer that we use for directx decoding is handled by mf on xbox.

brabebhin avatar Nov 05 '23 12:11 brabebhin

If we'd include libass, my plan would be to output them just as regular image subtitles. I don't see a reason why this should not work. Libass renders text subs to bitmaps, we turn them into SoftwareBitmaps and add them to the ImageCues. MediaPlayer renders them on top of the video image using HW acceleration.

Using burn in subtitles has many disatvantages. It is limited to the video's resolution, so especially if you have a lower res video, burn in subs look blurry. It is also slow if you do it in software. Since ffmpeg filters do not support dx11, using its filters would automatically mean doing it slow in CPU memory. Plus, when using GPU video decoding, it would mean copying decoded video frames from GPU->CPU copy, then doing sub rendering in SW, then copy back from CPU to GPU for rendering. That's a heavy performance penalty.

lukasf avatar Nov 05 '23 19:11 lukasf

If we'd include libass, my plan would be to output them just as regular image subtitles. I don't see a reason why this should not work

Proper ass rendering means to provide images at framerate similar to the video (or half the fps at minimum) and the ImageCue feature is not made for that,

ASS subtitle animations are a crucial feature, especially in the area of Anime.

Here are some examples in case you're not familiar:

  • https://youtu.be/t-LXqxc9cPw?t=255
  • https://youtu.be/t-LXqxc9cPw?t=453

Using burn in subtitles has many disatvantages. It is limited to the video's resolution, so especially if you have a lower res video, burn in subs look blurry.

Correct - it's always the last resort option.

It is also slow if you do it in software. Since ffmpeg filters do not support dx11, using its filters would automatically mean doing it slow in CPU memory. Plus, when using GPU video decoding, it would mean copying decoded video frames from GPU->CPU copy, then doing sub rendering in SW, then copy back from CPU to GPU for rendering. That's a heavy performance penalty.

Yup, that's why I've created the textszub2video filter for uploading subtitle frames into a hw context, so the overlay can be done in hw.

Since ffmpeg filters do not support dx11

The QSV filters do - but only with Intel hw...

softworkz avatar Nov 05 '23 19:11 softworkz

Then there's OpenCL and Vulkan, for both exist filters in ffmpeg. You can derive an OpenCL hw context from qsv and cuda contexts. It doesn't help much in case of Nvidia though, because you still can't get D3D surfaces. AMD is a bit late to the game, but I'm currently in contact with them as they are about to introduce an AMF hw context to ffmpeg including a set of filters. On Windows, their backend API is D3D, so for AMD it will probably work as well.

softworkz avatar Nov 05 '23 19:11 softworkz

We can't really ignore AMD. As long as they provide iGPU, their hardware is to be supported.

brabebhin avatar Nov 06 '23 09:11 brabebhin

Let's not forget that this lib is a playback lib, and we are rendering to a MediaPlayerElement which uses D3D11 tech. I don't think that any proprietary cuda or whatever could help us here (even if we'd ignore the AMD question). We need the images in D3D11.

It could be that ImageCue is not fast enough for animations. Frame rate does not really have to be in sync with video, if you do not burn in. But frame times need to be more or less stable and frame rate not too low. I'd think it's worth a try. But there does not seem to be any progess on the meson build PR at libass, unfortunately. So it is difficult to integrate it into our build system.

If ImageCues really do not work, then things would become really messy. Either we'd have burn in, with all its downsides. Or we'd have to come up with a completely custom subtitle renderer, fully synced up with the actual video. I don't like that idea. Also because it's not UI agnostic, so a UWP renderer would be needed, as well as a WinUI renderer. Well there would even be a third option: If your filter can render ass subs into a video stream, then we could expose that stream as a separate MediaStreamSource. The subtitle could be rendered to a MediaPlayerElement right above the "normal" MediaPlayerElement, synced up using MediaTimelineController. But is's also quite a complicated hack. This all sounds pretty bad to me.

lukasf avatar Nov 06 '23 19:11 lukasf

Technically we could create our own MPE with frame server mode, which would be reasonable reusable between uwp and winUI, just different namespace imports. But we would still relay on directx. I don't think cuda/whatever would be different than software decoder on how they would fit in the decoder pipeline, they all eventually end up in a directx surface. I don't think ignoring directx on windows is a good idea. I can see why intel, nvidia and amd (i wonder when qualcomm will too) would try to push their own tech stacks, but i don't think we have the man power to deal with that arms race. Directx makes thinga simple, portable across hardware, performance is more than enough.

brabebhin avatar Nov 06 '23 21:11 brabebhin

We need the images in D3D11. Let's not forget that this lib is a playback lib

Correct. Only with QSV it would work. But as I have spent significant time on this subject over the past years (we have probably the best-in-class automatic generation of ffmpeg video processing pipelines for realtime transcoding), I would agree that this is out-of-scope for this library as it really takes a lot of time for getting this all working, even though subtitles are just one part of the story.

It could be that ImageCue is not fast enough for animations. Frame rate does not really have to be in sync with video,

It's not that much about being fast and being in sync - it's about being contiguous, i.e. show one subtitle image after the other, but without gaps (=> flickering) and without overlap (=> tearing).

So it is difficult to integrate it into our build system

Why that? I have it in a Visual Studio solution as a VC project and for production, we're building with gcc on MSYS2..

If ImageCues really do not work, then things would become really messy. Either we'd have burn in, with all its downsides. Or we'd have to come up with a completely custom subtitle renderer, fully synced up with the actual video. I don't like that idea. Also because it's not UI agnostic, so a UWP renderer would be needed, as well as a WinUI renderer.

I wouldn't like that either. The only two good things I can note are:

  • libass can tell you whether there's a change in frame content without rendering it. This means that processing can be done very adaptive, and you need to do processing at high fps only when there's really animated content to be presented
  • There's another filter for outputting graphical subtitles. It works by detecting regions with content, so that the output can be just one or few bitmaps at a time, like required for those formats. Such algorithms could be used to avoid dealing with large frames. The downside though, is that it also gets more complicated when you have to deal with multiple (possibly animated) D3D surfaces.

The subtitle could be rendered to a MediaPlayerElement right above the "normal" MediaPlayerElement, synced up using MediaTimelineController.

Do those elements even support transparency?

softworkz avatar Nov 06 '23 21:11 softworkz

I don't think cuda/whatever would be different than software decoder on how they would fit in the decoder pipeline,

It would be different because the nvidia decoders can output frames directly into a CUDA context.

I don't think ignoring directx on windows is a good idea. I can see why intel, nvidia and amd (i wonder when qualcomm will too) would try to push their own tech stacks,

I think a main reason is platform independence. All of them are supporting interoperability with OpenCL and Vulkan, so it's not really about fencing strategies (well - not in this case at least). And I think Intel did it best, because their QSV stack is platform independent, but built on D3D for Windows and VAAPI for Linux. AMD are about to follow a similar path.

So, it's all not that bad from the side of the industry, but all this doesn't help us for that specific problem. I rather see the shortcoming here at the side of MS, because they created all the decoder APIs for D3D11 (and DXVA2 previously), but they didn't do much in the area of hardware image processing. (some APIs were added for D3D11, but ffmpeg doesn't support them).

softworkz avatar Nov 06 '23 21:11 softworkz

This library is windows only and will always be, so portability in this case is all about hardware support. I've dealt enough with hardware specific GPU APIs to know i don't want to deal with it again soon.

If an API can provide directx levels of hardware portability and performance with better effect support then that would be worth pursuing. As a replacement for d3d11.

brabebhin avatar Nov 06 '23 21:11 brabebhin

Yes, we're bound to D3D11 here, that's probably out of question.

I have some more leads that might be helpful (or not at all):

  • Checking out how MPV is integrating libass MPV is based on ffmpeg as well, and while they are not using D3D, they are dealing with other hw contexts, so it might be interesting to take a look at how they are integrating libass rendering into their pipelines
  • Taking a closer look at MediaFoundation I'm quite familiar with DirectShow, but I never looked into MF, so I don't know whether MF might have something to offer what could be useful for this task
  • Game development libs For the case when going down the route of doing custom rendering, there might be some useful libs which can take care of the heavy lifting with regards to managing and updating D3D surfaces

softworkz avatar Nov 06 '23 21:11 softworkz

Just for the record, this is solely a philosophical conversation.

MF can handle the ass just fine. MPE and MediaPlayer are based on MF interfaces that are easily accessible to everyone in c++. The trouble here is that using those interfaces directly is complicated, it involves essentially rewriting MPE, and it does not guarantee it will actually fix the problem of subtitle fonts. I had a look at this some time ago. It is theoretically doable. It would be great to have our own MPE. But it's a lot of work. Possibly for nothing.

Theoretically we can implement subtitle rendering using win2D. At least for the non-animation stuff this should be pretty easy. I actually tried that at some point and had sub/srt going quite well.

brabebhin avatar Nov 06 '23 22:11 brabebhin

Can you confirm that it's the same on your side, i.e. that you can set the font to Bold but it ignores the FontFamily setting?

This is indeed the case. Sorry for the late response. I think accessibility settings override this: font is something that you can explicitly set in windows settings, Bold is not.

brabebhin avatar Nov 14 '23 21:11 brabebhin

I think accessibility settings override this

Yup, I've found the same. On WinUI it's even worse: The background boxes are shown but the text is completely invisible. You can only make it visible by applying a text effect in Windows settings. What a mess..

softworkz avatar Nov 14 '23 21:11 softworkz

On the other hand it means they are doing things with subtitles, and might also fix the frame server mode bug.

brabebhin avatar Nov 16 '23 21:11 brabebhin

On the other hand it means they are doing things with subtitles,

Doing? Maybe...but for the worse, not for the better. On UWP they are visible at least... 😆

softworkz avatar Nov 16 '23 22:11 softworkz

I'm sure there's some mechanism to deactivate the influence of accessibility settings on this. But it might once again take hours or days to find out.

softworkz avatar Nov 16 '23 22:11 softworkz

I don't believe it is worth looking into it. This is clearly a bug. And one that occurred recently. It was working fine a few weeks ago.

The accessibility settings shouldn't produce invisible subtitles lol. That's kind of the point of accessibility.

I'm sure there's some mechanism to deactivate the influence of accessibility settings on this. But it might once again take hours or days to find out.

You would probably have to dig into the media foundation interfaces that these classes wrap. I had them mapped at some point. They are almost a 1 for 1 map from media foundation to the Windows.Media namespace.

brabebhin avatar Nov 16 '23 22:11 brabebhin

The accessibility settings shouldn't produce invisible subtitles lol. That's kind of the point of accessibility.

Yes, I'm sure they will fix it, but what I want is that the accessibility setting don't have any influence on our app, because it has its own subtitle settings configuration and it's hard to sell that some parts need to be configured here and other parts there.

softworkz avatar Nov 16 '23 22:11 softworkz