mpv
mpv copied to clipboard
libmpv: Severe screen corruption when rendering video via mpv_render_context_render to virtual x-server
mpv Information
This is the first version in which the problem can be reproduced.
Found by bisecting. The problem also happens in master (3ab989e554)
mpv --version
mpv bad1-dirty Copyright © 2000-2023 mpv/MPlayer/mplayer2 projects
built on Jul 18 2024 13:05:23
libplacebo version: v6.338.2
FFmpeg version: 6.1.1
FFmpeg library versions:
libavutil 58.29.100
libavcodec 60.31.102
libavformat 60.16.100
libswscale 7.5.100
libavfilter 9.12.100
libswresample 4.12.100
Other Information
-
Linux version: Fedora Linux
-
Kernel Version: 6.9.8-200.fc40.x86_64
-
GPU Model: Intel Corporation AlderLake-S GT1
-
Mesa/GPU Driver Version: mesa-libGLU-9.0.3-4.fc40.x86_64 mesa-libGLU-devel-9.0.3-4.fc40.x86_64 mesa-filesystem-24.1.2-8.fc40.x86_64 mesa-va-drivers-24.1.2-8.fc40.x86_64 mesa-libglapi-24.1.2-8.fc40.x86_64 mesa-dri-drivers-24.1.2-8.fc40.x86_64 mesa-libgbm-24.1.2-8.fc40.x86_64 mesa-libEGL-24.1.2-8.fc40.x86_64 mesa-libgbm-devel-24.1.2-8.fc40.x86_64 mesa-libGL-24.1.2-8.fc40.x86_64 mesa-libGL-devel-24.1.2-8.fc40.x86_64 mesa-libEGL-devel-24.1.2-8.fc40.x86_64 mesa-libOpenCL-24.1.2-8.fc40.x86_64 mesa-vulkan-drivers-24.1.2-8.fc40.x86_64 mesa-libxatracker-24.1.2-8.fc40.x86_64 mesa-libOSMesa-24.1.2-8.fc40.x86_64
-
Window Manager and Version: mate marco
-
Source mpv: from git
-
Introduced in version: c172a650c4
not possible to reproduce in 3e612c07f4
Reproduction Steps
The problem can be reproduced using xpra -start xpra in seamless mode, e.g., starting a terminal which runs on a remote computer. Under the hood this starts an Xserver with Xdummy or Xvfb. The problem occurs with both. -in this terminal start a program that uses mpv-lib For instance https://github.com/v0idv0id/MPVideoCube.git Or (more difficult to compile: https://github.com/deeptho/neumodvb
Sometimes the video displayed in the programs looks ok, but sometimes video is heavily corrupted. Investigation shows
- If the programs are run under virtualgl on the remote computer, all is fine
- If the programs are run directly, they sometimes show the expected output: for video cube this means (mostly) artefact free video displayed on a cube. For neumodvb, this means a live tv channel showing artefact free video. However, sometimes the video is completely black, or heavily corrupted: the video contains vertical/horizontal lines, or only small parts of it appear on screeen, or it looks heavily pixellated. See also https://github.com/Xpra-org/xpra/issues/4300 for examples
- If screen corruption occurs, it continues to occur until a new video is displayed. If no corruption occurs at the start, then the video remains good for ever.
Expected Behavior
Non-corrupted video
Actual Behavior
Corrupted video.
Additional info:
- If an overlay is drawn on top of the video (neumodvb), after mpv renders it, that overlay looks fine. Both programs also run fine natively, not under xpra
- If in neumodvb I save the video rendered by mpv-lib, that video is also corrupted, but the overlay is not, suggesting strongly that mpv is causing the corruption
- The mpv command line client does not show corruption when playing videos
- The ONLY difference between the last good and first working mpv version seems to be a difference in default interpolation code, but that may just "trigger" the problem, rather than being the cause.
- Once the video displayed is corrupted, the corruption stays of the same type, although resizing. the window has some effect on the details of the corruption. So to reproduce the problem, multiple trials may be needed.
Please see the sreenshots here: https://github.com/Xpra-org/xpra/issues/4300
I cannot attach log files, as there are none in this use case. Or is it possible to start one in libmpv?
Log File
Sample Files
I carefully read all instruction and confirm that I did the following:
- [X] I tested with the latest mpv version to validate that the issue is not already fixed.
- [X] I provided all required information including system and mpv version.
- [X] I produced the log file with the exact same set of files, parameters, and conditions used in "Reproduction Steps", with the addition of
--log-file=output.txt. - [X] I produced the log file while the behaviors described in "Actual Behavior" were actively observed.
- [X] I attached the full, untruncated log file.
- [X] I attached the backtrace in the case of a crash.
I cannot attach log files, as there are none in this use case. Or is it possible to start one in libmpv?
There absolutely is. Set the "log-file" option via libmpv.
I have added mpv-log-file=/tmp/mpv/log to the mpv.conf that is being loaded by libmpv in neumodvb, but it has no effect. whereas other options in that file, e.g. screenshot-directory=/tmp/screenshots work as expected
I have added mpv-log-file=/tmp/mpv/log to the mpv.conf that is being loaded by libmpv in neumodvb, but it has no effect. whereas other options in that file, e.g. screenshot-directory=/tmp/screenshots work as expected
it's not mpv-log-file=, it's log-file=.
Here is an mpv log file while the problem occurs.
- I start neumodvb
- I start displaying channel 4. There is audio but nothing is displayed
- I stop playback
- I start it a again. This time there is video but corupted by black horizontal and vertical lines. This is with git version c172a650c4 , which is the first version in which I can reproduce the corruption. libplacebo is at version 64c19545
This screenshot shows the corruption
Duplicate of #13998
Duplicate of #13998
Are you sure? I don't see gpu-next being used here.
I have tried adding correct-downscaling=no to the mpv configuration With only 5 trials I notice that
- The vertical/horzontal lines on 16:9 content do not seem to appear
- The problem that the screen remains black (no video) at the first trial is still there
- Other forms of corruption are also still there. See picture below. The strange thing is that these corruptions do not occur at each trial, so it must have something to do with initialisation. Note that the I did not resize the window manually, so the scaling is always the same.
Adding profile=fast produced no pictures at all
Duplicate of #13998
Are you sure? I don't see
gpu-nextbeing used here.
I'm not sure what's going on here. I think we are looking at multiple different issues. For example the screenshot from https://github.com/mpv-player/mpv/issues/14577#issuecomment-2251558243 shows corruption that happens with Intel when using gather. But indeed previous report was about Windows and gpu-next. Though the symptoms are the same. First broken commit https://github.com/mpv-player/mpv/commit/c172a650c41a28d77d14de4af398cfd90caaa805 makes it clear we have some issue when downscaling, which is the same case as in the other issue.
The vertical/horzontal lines on 16:9 content do not seem to appear
Ok, so it seems to confirm that at least part of the problem is the same as the other one.
Adding profile=fast produced no pictures at all
That's worrying, because in this mode, we really don't do much work.
[ 0.014][v][libmpv_render] GL_VERSION='4.5 (Compatibility Profile) Mesa 24.1.2'
[ 0.014][v][libmpv_render] Detected desktop OpenGL 4.5.
[ 0.014][v][libmpv_render] GL_VENDOR='Mesa'
[ 0.014][v][libmpv_render] GL_RENDERER='llvmpipe (LLVM 18.1.6, 256 bits)'
[ 0.014][v][libmpv_render] GL_SHADING_LANGUAGE_VERSION='4.50'
Are you able to test with older mesa build? I'm curious if those issues are new or were there before.
It would also be helpful to link the code in the application where mpv is integrated. The GL rendering has some constraints and there's a lot that can go wrong.
In neumoDVB, this is the source file that handles libmpv callbacks https://github.com/deeptho/neumodvb/blob/master/src/viewer/neumompv.cc Note that depending on the choices of the user, this code also draws an overlay on top of mpv, but the issue of this ticket happens also without that overlay drawing.
Re the constraints: I am aware of those, although it is not always easy to understand them correctlt: a long time ago, I also had to make some changes to prevent the whole program from crashing when more than 2 mpv playbacks were used simultaneously. This happened after some silent change in GL (but I found some comment in a GL source file).
The culprit then turned out to be illegal access from multiple threads to the same GL context. This used to work fine (of course user code has to guard with locks to prevent concurrent access), but I think now the context can only be used by the thread that created it.
If you are wondering about the convoluted construct with the thread_local variable to store the context: it is needed to solve this problem. One of the problems was that libmpv uses different threads for the callbacks made by different video playbacks and the user code has to detect when it is called from two different threads.
Regarding the issue of this ticket, this is not relevant, as only one playback is running in the tests.
I found the limbmpv docoumentation you link to a bit misleading: "This assumes the OpenGL context lives on a certain thread controlled by the
- API user. " => it is libmpv that creates and controls the tread, not the api user. The api user indeed controls the context but not the thread and should be prepared for suddenly being called from a different thread.
- it is libmpv that creates and controls the tread, not the api user. The api user indeed controls the context but not the thread and should be prepared for suddenly being called from a different thread.
This is incorrect.
mpv will call the update callback on any thread it wants, but you must consistently use mpv_render_context_render on the thread that has the OpenGL context.
You can see in this example how it's done with an event and on_mpv_render_update does not itself call any render functions.
Looking at neumompv.cc you seem to be doing this correctly.
In any case it should be easy to reproduce this bug with one of the mpv examples.
- it is libmpv that creates and controls the tread, not the api user. The api user indeed controls the context but not the thread and should be prepared for suddenly being called from a different thread.
This is incorrect. mpv will call the update callback on any thread it wants, but you must
Yes, that is what I wrote: "it is libmpv that creates and controls the tread, not the api user."
consistently use
mpv_render_context_renderon the thread that has the OpenGL context.
That is the thread calling the user callback, so an mpv thread and not controlled by the user. Is there any guarantee that for the same video playback, mpv calls the callback always on the same thread to render successive frames?
Otherwise it will get really complicated, as the user code callback can not draw but instead would have to delegate this task to some other thread, which would create needless context switches.
You can see in this example how it's done with an event and
on_mpv_render_updatedoes not itself call any render functions.
No, I did not claim that it mpv draws. The user callback draws, but it does that in a thread created by mpv. The surprising bit was that mpv calls from multiple threads for multiple simultaneous video playbacks and that it is then not possible to use the same GL context even when locking to prevent simultaneous access.
I can understand why libmpv would call from a seperate thread for each video playback, but it would be helpful to mention this in the documentation, along with a warning that openGL then requires using a separate GL context per thread (it did not require that in older versions).
Looking at neumompv.cc you seem to be doing this correctly.
Thanks for that confirmation.
That is the thread calling the user callback, so an mpv thread and not controlled by the user. Is there any guarantee that for the same video playback, mpv calls the callback always on the same thread to render successive frames?
No. Why would you need that?
Otherwise it will get really complicated, as the user code callback can not draw but instead would have to delegate this task to some other thread
Yes. This is what you have to do and just how the sdl example I linked works.
The user callback draws, but it does that in a thread created by mpv.
No, this is the exact opposite of what I said. You create the OpenGL context and control the draw thread. mpv calls the callback to tell you that you should draw. Do not draw inside the mpv callback, that's broken.
but it would be helpful to mention this in the documentation, along with a warning that openGL then requires using a separate GL context per thread
https://github.com/mpv-player/mpv/blob/acc69e082fff67398834de3045ef48d33d2f4d54/libmpv/render_gl.h#L31-L40
It seems I was confused by some older, dead code in neuomdvb. The rendering indeed takes place on a thread created by neumodvb, not by libmpv.
Just for the record: this screen corruption still occurs in fedora42. It does not occur when the application is run under VirtualGL (through vglrun). It is probably a driver bug but causes problems in using libmpv