FFmpegInteropX
FFmpegInteropX copied to clipboard
Bitmap Subtitles with WinUI3
Just wanted to check back: Has this ever worked for any of you so far?
I must admit i haven't actually tried this in winui3, but this worked in uwp (at some point). I will give it a try again tonight.
It's working fine in UWP!
Ah I see where this is going lol.
It's not crucial right now as we're using MPV as primary player for the WinUI app, but @lukasf had introduced the WindowId parameter to the CreateFromUriAsync() method, so I was wondering whether he ever got it working.
Nope. It is one of the things that still does not work on WinUI. It does not have anything to do with the WindowId. ImageCues just don't render at all.
Yep, it is not working.
Nope. It is one of the things that still does not work on WinUI. It does not have anything to do with the WindowId. ImageCues just don't render at all.
Alright. Thanks for clarifying.
So I think I figured out how to use RenderSubtitlesToSurface, you need to use a CanvasRenderTarget (off screen rendering) with win2D.
This engine seems to render subtitles in a completely different manner compared to the XAML composition based subtitles.
https://learn.microsoft.com/en-us/windows/uwp/audio-video-camera/play-audio-and-video-with-mediaplayer#use-mediaplayer-in-frame-server-mode
Doesn't this code work...?
No, not quite. I mean it works for rendering video, but not subtitles. Clearly whoever wrote that doc didn't bother checking. The example would keep throwing InvalidArgumentExceptions. However, I thought that maybe whoever implemented RenderSubtitlesToSurface wasn't completely outright shipping a useless API in the public winRT interface, so I tried several things.
The current state of my WIP implementation is here
https://github.com/brabebhin/MayazucMediaPlayer/blob/main/source/MayazucNativeFramework/FrameServerRenderer.cpp
No, not quite. I mean it works for rendering video, but not subtitles. Clearly whoever wrote that doc didn't bother checking. The example would keep throwing InvalidArgumentExceptions.
The code is taken from this sample: https://github.com/MicrosoftDocs/windows-dev-docs/blob/docs/uwp/audio-video-camera/code/MediaPlayer_RS1/cs/MainPage.xaml.cs
At least at some point it must have worked.
Here's one more code sample I found: https://github.com/drasticactions/WinUIEx/commit/4ad8524a6cf4aee6abbc66df248f8aea4a4ca767
Interestingly there's also some mentioning of "crashes".
But I have a theory...
I doubt it. I've been trying that api ever since it came out, way before winui. This is the first time I managed to make it not crash.
I was probably the only crazy dude in the world trying it. It only became a focal point when winui3 shipped without MPE when others started look at it. I even submitted bug reports and crash dumps in the feedback hub, all to be ignored.
Have you tried running the sample as is, i.e. 100% uinchanged?
I didn't run exactly that code. Never knew it existed. But the code I had was practically identical to that. The catch is that you can't render the subtitles to the software bitmap object they have there, only the canvas render target. My theory is that it is some limitation in XAML, and the MF only accepts an off screen buffer to render to. Probably because IMFMediaEngine interface of MF which is basically the native implementation of MediaPlayer has a swap chain mode, and in the swap chain you always render to the back buffer.
Or they could simply be rendering XAML composition elements (like they do with the ordinary subs) to a bitmap and for some reason, they can only do it with a back buffer.
But I have a theory...
My theory is that whoever created winRT back in the early 2010s has long leaved Microsoft or was fired in one of their yearly purges, and the subsequent teams don't have the capacity to understand the complexities of the system involved, and they get purged as well before they have the time to build up the knowledge, and the trend continues to this day. This explains why winUI, UWP, MAUI and even winRT itself has some unexplainable bugs that nobody is trying to fix - they simply don't know how, and don't have the possibility to learn how to fix them.
I have left a comment over in the other repo, here's a copy:
Hi @dotMorten,
we at FFmpegInteropX are wondering about the intended use of the RenderSubtitlesToSurface() method as well, also having seen "crashes",
I haven't worked on it myself, just heard the reports saying it's crashing. Recently I re-read the docs and I had a certain suspicion. Yet it didn't match with the symptoms of "always crashing" and I abandoned the idea. Now I found this commit of yours which says:
this'll only render on one frame until next text. Crash if repeating without this flag getting set.
That's quite different from "always crashing" and actually aligns with what I had thought earlier. There is a pragraph in the docs which caught my attention:
For the overload with a target rectagle:
reference: https://learn.microsoft.com/en-us/uwp/api/windows.media.playback.mediaplayer.rendersubtitlestosurface?view=winrt-22621
=> So why should this method be "less efficient" than the other overload? This method renders to a constrained area only and the other one renders to the full-size frame. How can it be "less efficient"?
There are two more hints:
- Full-Frame Overload has this text:
if the method returns false, then no subtitles were rendered. In this case you may decide to hide the subtitle render surface in your UI. - Rectangle Overload also says:
but it allows you to use the same surface for rendering video and subtitles rather than requiring a separate surface for subtitles.
Rendering onto the same surface as the video frames comes with the specific implication that you need to re-draw the subtitle again and again for every single video frame that is shown, because each video frame overwrites everything from before and so you need to redraw the subtitle text for each frame. To make this more economic (just think of 4k video at 60fps), you can/must constrain the rendering to a certain area (no idea how to determine it actually). And - if I'm right - that would be the explanation why they say that it's less efficient: even though the area is constrained, you need to do it for every video frame.
I suspect that the other overload is meant to be used in a very different way: you create an additional (transparent) surface on top of the video frame and at the same size. When there's a subtitle graphic to display, you call RenderSubtitlesToSurface() to paint it onto your subtitle surface. As this is a separate surface, it doesn't get overwritten by the video frames.
=> In turn, you need to render it just once. And that would be the explanation why you were seeing subsequent calls failing
Conclusion would be: thiis method is not meant to be called "per-frame" but only once on each change and meant to be painted on a separate surface, while the other overload is meant to be painted on each video frame but constrained to a certain region.
I agree that the best way is to have 2 bitmaps, one for video and one for the subs. However, I don't think that has anything to do with the crash I was experiencing, and probably what the winUIEx devs were having was a completely different problem - they are already doing the off screen rendering with the back buffer of the swap chain.
However, in my case, I don't think there's going to be a problem to separate the implementation into 2 bitmaps. But at this point I am just overly happy I finally figured out how to do it.
I don't think that has anything to do with the crash
what the winUIEx devs were having was a completely different problem
I am just overly happy I finally figured out how to do it
That's why I posted over there: I knew you wouldnt be much interested in testing my theory... 😆
There are some interesting points in your theory.
- I already tried rendering to the same surface earlier today, without the overload that takes a rect region to render to, on every video frame. If the surface is a software bitmap, it crashes, if the surface is a canvasrendertarget, it works fine. More so, it will render the subtitles exactly as you would expect them to be render (bottom centered, with proper styles and such), and the area where there's no subs shown can be transparent, thus obtaining the effect you are aiming for. You just need to correctly order the operations before issuing the GPU command: first the video frame, then the subtitles. Note that DirectX rendering operations are not issued to the GPU until a Flush or Close command is called onto the CommandList (in our case, the CommandList is a drawing session). So you can accumulate lots of render commands in a sequence before anything is actually rendered.
- The current implementation sort of uses 2 surfaces, one for video and one for subtitles, and then merges both in the final output to be displayed, which is one image. Both surfaces have the exact same size, the size of the final image that is to be displayed, and that comes from XAML.
- I suspect the overload which takes a rect to limit the region to render into has some use cases meant for specific cases, such as custom padding to the subs area (I know you were interested in this), shifting the subs up when the transport controls show up, for games in which the subs should always be rendered in a specific area of the viewport, etc.
- The only reason to use 2 different output XAML Images is so that you don't need to render subs on every video frame, and you can technically multithread them as well. So extra performance.
- The documentation is completely bogus: You can use either overload with any valid directx surface, no matter if the surface was rendered to before or not. It would have been much more useful to explain the requirements instilled in the directx surface, rather than bogging us down with all sorts of plainly visible optimizations (like hiding the subtitle surface when there's no sub to draw). It is really debatable if this is the right way to do it, or if we should render a transparent surface using directx anyways. It depends on the XAML performance, I heard bindings to dependency properties are quite slow.
- The so called bug is most likely not a bug. The MediaPlayer implementation likely has some very strict requirements on how the DirectX surface should be configured. Those can be retrieved from the desc property of the DXGI surface that's wrapped behind the CanvasRenderTarget. The problem is nobody bothered documenting those requirements, and the sample code is wrong. This comes back to my theory: whoever did this are long gone, and none of the newbiews can figure out what they did.
If you can share one of those files with bitmap subtitles I can gladly test to see if they work (the external disk that was storing mine died a few days ago)
- I already tried rendering to the same surface earlier today, without the overload that takes a rect region to render to, on every video frame. If the surface is a software bitmap, it crashes,
Does it "crash" always or does it succeed once each time when there's a subtitle change?
The current implementation sort of uses 2 surfaces, one for video and one for subtitles, and then merges both in the final output to be displayed,
That's the worst possible way actually...
I mean: none of these procedures is suitable for production use anyway. There's just a single right way for this:
- Compressed video goes into GPU/GpuMem
- GPU decodes video frames into surfaces (in GpuMem)
- The app only tells the GPU at which point in time it should present each surface
- shown frames go back into the pool for re-use
- no video frame ever leaves GpuMemory
rather than bogging us down with all sorts of plainly visible optimizations (like hiding the subtitle surface when there's no sub to draw). It is really debatable if this is the right way to do it, or if we should render a transparent surface using directx anyways.
The latter is exactly what they are suggesting... Having a transparent layer in the scene still requires calculation which can be saved by hiding the layer, even though it's transparent anyway.
If you can share one of those files with bitmap subtitles I can gladly test to see if they work (the external disk that was storing mine died a few days ago)
Check out these:
- https://github.com/softworkz/subtitletestsamples
- https://github.com/softworkz/SubtitleFilteringDemos/tree/master/Demo1
Let me know when you need something different. Thanks
Thanks for the files. Here we go
It's funny, cause the "Media Player" app of windows 11 doesn't even detect the subtitle. But clearly winRT and MF posses the capability to deal with the subtitle.
Am i going insane here?
I still can't believe this works, I expect to be some gotcha that i'm missing.
That's the worst possible way actually...
I mean: none of these procedures is suitable for production use anyway. There's just a single right way for this:
Compressed video goes into GPU/GpuMem GPU decodes video frames into surfaces (in GpuMem) The app only tells the GPU at which point in time it should present each surface shown frames go back into the pool for re-use no video frame ever leaves GpuMemory
I agree it is not production ready, however, no video frame ever leaves GPU memory. By the time the video frame server's VideoFrame available event triggers, the video has already been decoded and it exists in GPU memory (be it dedicated or integrated). This intercepts right before being presented to the user. At this point, the only way to even copy the data in CPU memory I believe is to cause a pipeline stall, which will instantly kill performance and drop it to like 1 FPS or something.
Does it "crash" always or does it succeed once each time when there's a subtitle change?
I could never, ever get a subtitle to render to a SoftwareBitmap.
which will instantly kill performance and drop it to like 1 FPS or something.
No, sw is not that bad. Remember, FFmpegInteropX has a setting "FFmpegSoftwareDecoder". It's primarily taking resources and draining batteries empty on laptops.
I agree it is not production ready, however, no video frame ever leaves GPU memory. By the time the video frame server's VideoFrame available event triggers, the video has already been decoded. This intercepts right before being presented to the user.
You are calling "RenderVideoToSurface", "RenderSubtitleToSurface" then your are "merging" the surfaces together beforte presenting. Even when it's in GPU memory, that's still a lot of operations, each time, something gets copied.
Is the Win2dCanvas doing everything and always in GPU memory?
It's funny, cause the "Media Player" app of windows 11 doesn't even detect the subtitle. But clearly winRT and MF posses the capability to deal with the subtitle.
Am i going insane here?
I still can't believe this works, I expect to be some gotcha that i'm missing.
The subs are looking fine!
Can you check whether the outline issue exists when doing this way?
The performance problem of the pipeline stall lies in the CPU - GPU synchronison at driver level, not on the performance of either one of the components. It is easy to transfer data from cpu to gpu, which is what the software decoder does. The other way around it is much, much harder. There's also hardware constraints here in the architecture of the pci-e lanes as well, the topic is pretty complex.
The GPU is designed to handle multiple drawing commands and copy data inside its GDDRAM and it is very good at doing it. I don't believe the current implementation has major performance issues, but i will implement the separated drawing of subs in a different surfaces anyways. There are multiple reasons to do the separation, not just performance gain. There's also the question of HDR, which i haven't tested at all.
Glad to know the subs look good. I will check the outline as well, once I emerge from the nightly hibernation cycle.
Yes, the win2d cavnas does all data manipulation in gpu memory. You may be able to initialize it with data from the system memory but once that's done, all is done in GPU.
Following up on your comment on WinUIEx's subtitle rendering: I honestly don't remember the details - I've since deleted the mediaplayer implementation since WinUI now has it built-in, so this was no longer needed. I looked at the WinUI3's source code and I don't see any use of the subtitle rendering stuff - the closest I got was the timed text stuff which might be worth digging into: https://github.com/microsoft/microsoft-ui-xaml/blob/winui3/release/1.5-stable/dxaml/xcp/dxaml/lib/TimedTextSource.cpp
It does seem like they just use XAML to render the subtitles judging from the comments in this issue: https://github.com/microsoft/microsoft-ui-xaml/issues/7981 and that rendering them directly into the surface doesn't work: https://github.com/microsoft/microsoft-ui-xaml/issues/6610
@dotMorten - Thanks for getting back!
I looked at the WinUI3's source code and I don't see any use of the subtitle rendering stuff
It does seem like they just use XAML to render the subtitles
Correct. That's done in CueStyler.cpp (and the other) for which I made 2 PRs and there's also a pending issue. of mine regarding this. That's why we're evaluating the RenderSubtitles() method, which uses an entirely different implementation (from the Windows.Media area presumably).
I honestly don't remember the details
I did a GitHub search for this API, and the only real results were the old MS samples and your repo (and this one here), so we're a very small circle of people having dealt with it, apparently 😉
@softworkz I am once again going to need some test files for the outline ^^
Scripting seems to partially work for SSA/ASS
but outlines and colors are overridden by Windows settings
This is the expected result
The subs also seem to overlay on top of each other when there's multiple active at the same time. This is worth investigating if it is us using the incorrect region when adding external subs or the renderer is bugged.
I guess my next mission would be to implement my own rendering with win2D, in order to get rid of the windows settings nonsense.