mpv
mpv copied to clipboard
Getting VK_ERROR_DEVICE_LOST upon resizing mpv's window
Important Information
Provide following Information:
- mpv version: 0.33.0
- Linux Distribution and Version: Arch Linux
- Source of the mpv binary: Official repositories
-
If known which version of mpv introduced the problem: I was getting this crash since v0.32.0, but searching through the issues in the mpv repository, I found a post by a developer who said the
libplacebo
version in the official Arch Linux repositories was too old (and it was indeed), so I waited until v0.33.0 was officially released (therefore with a newer version oflibplacebo
), but the issue persisted. - Window Manager and version: mutter 3.38.2
- GPU driver and version: NVIDIA 455.45.01 (proprietary driver)
- Possible screenshot or video of visual glitches: https://www.youtube.com/watch?v=lG8X09lm6zE
Reproduction steps
First, let me say I'm not sure if I should've opened an issue here, or in the libplacebo
repo, so feel free to close this if it's in the wrong place.
Anyway, I could only reproduce this when combining --profile=gpu-hq
with --gpu-api=vulkan
. To reproduce this, pick a random video (the download link for the one I used is at the end of this post), then run mpv in a terminal window, like this:
$ mpv --no-config --profile=gpu-hq --gpu-api=vulkan BigBuckBunny.mp4
After that, start resizing the mpv window diagonally (I couldn't reproduce it when resizing horizontally or vertically), as fast as you can: mpv will freeze (and sometimes your whole desktop will too, for a brief moment).
mpv will also spam this following message in the console:
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] Failed holding swapchain image for presentation
[vo/gpu] Failed presenting frame!
And If you have journalctl -f
open (and if you have a NVIDIA GPU), this Xid error will show up:
dec 03 19:22:05 arch kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=83961, Ch 00000022, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_016a1000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
Expected behavior
mpv to not freeze and the NVIDIA driver to not throw a Xid error.
Actual behavior
mpv freezes and NVIDIA throws a Xid error.
Log file
https://gist.githubusercontent.com/guihkx/97e8437eb059868623564f18156d667b/raw/7d51f3bed46bc72661b8c1d81c30b4ff5c518427/log.txt
Sample files
http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4
Can confirm the issue, there is no need to resize "as fast you can" the issue can appear on any resize kinda randomly. Also, it's extremely likely that this is not mpv or libplacebo's fault but NVIDIA's bad drivers.
there is no need to resize "as fast you can"
Indeed! Today I was watching a video with my second screen enabled, and as soon as I clicked and dragged just a little to start resizing the window, the crash happened. It just seemed easier to reproduce by resizing it quickly, though. >:)
Can you still reproduce a crash/freeze without using --profile=gpu-hq
, though? What about using the OpenGL back-end? I couldn't in both cases...
I use gpu-hq
+ OpenGL daily and I've never seen a similar thing happen with that.
Also, it's extremely likely that this is not mpv or libplacebo's fault but NVIDIA's bad drivers.
It's also extremely likely that mpv and/or libplacebo would benefit from having better code to recover from device loss, since that can happen for entirely legitimate reasons as well.
As far as that distinction is concerned, I'm convinced libplacebo is, essentially, doing the correct thing. It's forwarding the error to vo_gpu's draw_frame
as the return value of pl_swapchain_submit_frame
, which the documentation states indicates some sort of severe failure (e.g. device loss).
So in terms of an actual change that needs to be made, it's most likely on the mpv side. More specifically, mpv needs some mechanism of signalling that the entire gpu context needs to be recreated (possibly as a new VOCTRL of some sort)
So an update to their beta Vulkan driver was released yesterday:
Vulkan Beta Driver Release Updates January 27th, 2021 - Windows 457.88, Linux 455.50.04
- Fixes:
- Fixed a bug in a stencil-buffer optimization that could occasionally result in VK_ERROR_DEVICE_LOST
Before I waste my time downloading, packaging and installing this beta driver, do you guys think that fix could be related to this bug?
Cheers.
Unlikely. We don't use stencil buffers. (But maybe they do, internally. Or maybe the bug affects more than just stencil buffers.)
Yep, it's definitely not fixed. Attaching a new log file nonetheless, since mpv's output appears to have changed slightly:
https://gist.github.com/guihkx/d32c0eeaf782ae2f0140de4c87e875a9/raw/5d33de105a7b141e9b071134bcb3e3dd84b0ba01/log.txt
NVIDIA driver 455.50.04 mpv 0.33.0 libplacebo 3.104.0
I can confirm I also have the same issue, also not just resizing the window but also going fullscreen or exiting fullscreen causes the error very frequently.
I have noticed that if I pause the video prior to going fullscreen it seems to practically never happen. If I pause the video prior to resizing, the crash is softer (e.g. MPV seems to crash/display black screen but the whole system doesn't freeze up anymore, and mpv doesn't hang anymore either)
I also noticed that if I change the video sync method from display-resample to audio, it significantly alleviates the issue but doesn't entirely eliminate it, but at least if it's just going in and out of fullscreen, it doesn't really happen anymore. Sad part is that this means no interpolation.
Has anybody reported the issue to nvidia?
Issue still the same almost a year later.
I'm on a laptop, and all the issues I had originally were when using PRIME Offloading
But now I have set things up so that my dGPU is the default GPU, so no offloading anymore. This solved a bunch of other issues I had (like games hard crashing my system when vsync was on)
But this particular issue actually got worse (happens more frequently) under this setup.
Also found this almost certainly related bug report thread on nvidia's forums: https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513
Also mentioned in that thread is a workaround used by DXVK to prevent said issue: https://github.com/doitsujin/dxvk/commit/16a51f3c03d5bc52ba67101fb5a5cd5b8d96fa94
I would make a test build of mpv and see if this fixes it here too or not but I'm not much of a developer so I'd probably struggle for days to get it done D:
Issue still the same almost a year later.
It might be a good idea to re-investigate whether the validation layers pick up anything that could explain this. Make sure you have the vulkan validation layers installed and then run mpv using --gpu-debug
.
Issue still the same almost a year later.
It might be a good idea to re-investigate whether the validation layers pick up anything that could explain this. Make sure you have the vulkan validation layers installed and then run mpv using
--gpu-debug
.
here's a gist: https://gist.github.com/Rabcor/1e58aa97be8545ff387c10d8bb3d65ba
Btw for me this is happening with the anime4k shaders, but not some other shaders.
I think the reason it isn't happening on GL might have something to do with the shaders not loading correctly on openGL in the first place (I mean mpv seems to load them but they're not actually doing anything).
Btw I did find a workaround of sorts...
If I disable the shaders before resizing the window the issue naturally won't happen, so I just added this to input.conf
CTRL+0 no-osd change-list glsl-shaders clr ""; show-text "GLSL shaders cleared"
f no-osd change-list glsl-shaders clr ""; show-text ""; show-text "GLSL shaders cleared!"; cycle fullscreen
MBTN_LEFT_DBL no-osd change-list glsl-shaders clr ""; show-text ""; show-text "GLSL shaders cleared!"; cycle fullscreen
Where I can hit CTRL+0 for resizing the window, and setting to fullscreen will never crash mpv anymore. The extra show-text is there just to add a tiny extra delay just in case.
To reapply the shader afterwards I can use something like
CTRL+u apply-profile Upscaling; show-text "Profile Applied: Upscaling"
or
CTRL+1 no-osd change-list glsl-shaders set "~~/shaders/Anime4K_Clamp_Highlights.glsl;~~/shaders/Anime4K_Restore_CNN_Light_VL.glsl;~~/shaders/Anime4K_Upscale_CNN_x2_L.glsl"; show-text "Anime4K: Modern 720p->1080p (HQ)"
A more ideal way would be for it to reload the config after going fullscreen, but I couldn't find a way to do tha through input.conf.
Also a note I thought I'd make:
The issue does not occur on Windows with the same config, only Linux, implying that as said before, yes, this is probably an issue in the nvidia driver.
Looks like a GPU hang.
might be a default settings mismatch, i think i might have managed to fix it with this setting (dunno which one is better)
vulkan-queue-count=1 #1 is default
swapchain-depth=1 #3 is default
seems like swapchain depth is reliant on queue-count ?
might be a default settings mismatch, i think i might have managed to fix it with this setting (dunno which one is better)
vulkan-queue-count=1 #1 is default swapchain-depth=1 #3 is default
seems like swapchain depth is reliant on queue-count ?
Ah, holy shit, this works!
Also I tried some variants like
vulkan-queue-count=3
swapchain-depth=3
And
vulkan-queue-count=3
swapchain-depth=1
But only setting both to 1 actually seemed to solve the issue, no other variation on the settings I tested seemed to work.
I believe queue count is gpu dependant, no idea how to check how many gpu can handle so I defaulted to 1. no idea how to check how many my gpu can handle but gpu might be the reason other settings bork out.
edit:
went and tested, other settings as well.. borks out as well, 1 works without issue, ~~might be async being borked?~~
anyway running nvidia gtx 970, drivers: nvidia 470.74-3, kde (disabling compositor)
. might be nvidia issue, unless someone with similar settings different gpu runs into same problem.
edit2: also aside from resizing, issues shows up when leaving full screen, though that's due to resize, quite often though at random.
edit3: something is about 1 and 1. I assumed queue count/swapchain is settings for vulan-async-computing/transfer tried disabling it while leaving value other than 1:1 still breaks..
tl;dr no idea why but needs to be queue count 1, swapchain depth 1. Anything else i tried borks out.
Can also confirm that setting --vulkan-queue-count=1
and --swapchain-depth=1
fixes the crash for me with my GTX 660 + 470.74 drivers. Full command I used to reproduce:
mpv --no-config --profile=gpu-hq --gpu-api=vulkan --vulkan-queue-count=1 --swapchain-depth=1 http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4
I tried to reproduce this on an Radeon R9 280x with amdgpu drivers and it doesn't crash there with the default values.
I found out that (perhaps unsurprisingly) that disabling the swapchain logic (setting it to 1 disables it, technically) somewhat dramatically affects performance, particularly for interpolation. My vsync jitter becomes about 10x worse according to statistics, compared to when it is set to 3.
I figured out a way to sorta combat it.
You cannot change the swapchain-depth post launch, you cannot change the rendering API either, after you have launched mpv with the swapchain depth set to 1 there's literally nothing you can do ot set it to 3 again besides turning it off and launching it again with swapchain depth set to 3 from the start.
My workaround for this is to set swapchain depth to 1 when in windowed mode, otherwise letting it remain at it's default (3), and just configuring mpv to always launch in fullscreen and disable the keybinds to exit fullscreen; like this:
in mpv.conf:
fullscreen=yes
input-conf= ~/.config/mpv/nofscycle.conf
#Other settings...
[Linux-Windowed]
profile-cond=package.config:sub(1,1) == '/' and not fullscreen
swapchain-depth=1 # Bypass issue with nvidia driver hangs on window resize, this setting reduces performance. Launch with --fullscreen argument or set fullscreen=yes to avoid.
nofscycle.conf:
MBTN_LEFT_DBL ignore # Disable cycling between fullscreen & windowed
f ignore # Disable cycling between fullscreen & windowed
ESC quit # change esc from exit fullscreen to quit mpv
With this setup, I can either launch mpv with mpv --fullscreen=no --input-conf=""
if I want to run mpv in a resizable window, or I can comment out those first two lines I put in mpv.conf.
It's not ideal, but since I mostly watch things in fullscreen anyways, I am more or less getting the best of both worlds here.
Also for people who mostly use windowed it's easy to flip the logic, just remove the first two lines I put in mpv.conf and only use the profile, then launch mpv --fullscreen=yes --input-conf=~/.config/mpv/nofscycle.conf
when you want to go fullscreen.
I've encountered this issue on manjaro with 470 and 495 dirvers. Tested it now, after 510.47.03 release with this command:
mpv --no-config --profile=gpu-hq --gpu-api=vulkan http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4
And the issue seems to be resolved.
If this is indeed fixed on R510, hopefully it gets backported to R470 because it's the last driver branch that will support my GPU. :c
I can still reproduce this in 510.47.03. It seems a lot less likely to happen, but still possible.
dmesg message:
[Feb 6 23:49] NVRM: Xid (PCI:0000:2d:00): 31, pid=30643, Ch 00000034, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x0_1a204000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Log with --gpu-debug
: https://fars.ee/YqBG
This isn't really resolved, I can second OceanS2000 on that, it seems a lot less likely to happen but it still does on 510 drivers when there's no config.
But when i'm using certain aforementioned shaders it's just as bad as before. Honestly i think i already mentioned earlier that it rarely happens when not using shaders.
Is there any way to solve this problem temporarily?
Is there any way to solve this problem temporarily?
It could probably be worked around by forcing all resizes to be synchronous. I guess I could add a debug flag for that.
Is there any way to solve this problem temporarily?
It could probably be worked around by forcing all resizes to be synchronous. I guess I could add a debug flag for that.
I'll try, thank you very much!
Can you guys retest this? I can't seem to reproduce this anymore, which is great, especially because my GPU will only be getting limited drivers updates...
- OS: Arch Linux
- mpv: 0.34.1
- GPU: GTX 660
- Drivers: 470.141.03
$ mpv --help
mpv 0.34.1-dirty Copyright © 2000-2021 mpv/MPlayer/mplayer2 projects
built on UNKNOWN
FFmpeg library versions:
libavutil 57.17.100
libavcodec 59.18.100
libavformat 59.16.100
libswscale 6.4.100
libavfilter 8.24.100
libswresample 4.3.100
FFmpeg version: n5.0.1
Windows 10 here. Mpv kept crashing and eventually froze my entire OS when I attempted to resize a window. I jumped from 0.34.0 to 0.35.0-15 after that, and resizing works more reliably now, but mpv can still crash... at least it doesn't seem to freeze my system anymore.
Can you guys retest this? I can't seem to reproduce this anymore, which is great, especially because my GPU will only be getting limited drivers updates...
- OS: Arch Linux
- mpv: 0.34.1
- GPU: GTX 660
- Drivers: 470.141.03
$ mpv --help mpv 0.34.1-dirty Copyright © 2000-2021 mpv/MPlayer/mplayer2 projects built on UNKNOWN FFmpeg library versions: libavutil 57.17.100 libavcodec 59.18.100 libavformat 59.16.100 libswscale 6.4.100 libavfilter 8.24.100 libswresample 4.3.100 FFmpeg version: n5.0.1
I can no longer reproduce this issue either.
$ mpv --version
mpv 0.35.0 Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
built on UNKNOWN
We should probably close this thread, I don't know how or when but it seems like it's been resolved. I'm guessing it was an nvidia driver update that resolved it rather than anything MPV did.
$ nvidia-smi
Sun Jan 1 15:43:48 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 |
I still ahd issue in drivers 510, so something between 510 and 525 must have solved it.
It still happens for me. Using git
build of mpv and 525.60.11
nvidia driver.
$ mpv --version
mpv 0.35.0-67-gad65c8855b Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
built on Sat Dec 31 02:01:04 2022
libplacebo version: v5.229.1-49-g22ce304
FFmpeg version: n5.1.2
FFmpeg library versions:
libavutil 57.28.100
libavcodec 59.37.100
libavformat 59.27.100
libswscale 6.7.100
libavfilter 8.44.100
libswresample 4.7.100
for me, above suggestions did not fix it (still having mpv hang on window resize with error [vo/gpu-next/libplacebo] vkQueueSubmit: VK_ERROR_DEVICE_LOST (../src/vulkan/command.c:390)
):
vulkan-queue-count=1
swapchain-depth=1
nvidia driver version:
Fri Jan 6 20:17:35 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
mpv version:
mpv 0.35.0 Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
built on Sat Nov 12 15:31:46 UTC 2022
FFmpeg library versions:
libavutil 57.28.100
libavcodec 59.37.100
libavformat 59.27.100
libswscale 6.7.100
libavfilter 8.44.100
libswresample 4.7.100
FFmpeg version: 5.1.2
Seems this error is not limited to nVidia cards: https://github.com/mpv-player/mpv/issues/10425