mpv icon indicating copy to clipboard operation
mpv copied to clipboard

Getting VK_ERROR_DEVICE_LOST upon resizing mpv's window

Open guihkx opened this issue 3 years ago • 26 comments

Important Information

Provide following Information:

  • mpv version: 0.33.0
  • Linux Distribution and Version: Arch Linux
  • Source of the mpv binary: Official repositories
  • If known which version of mpv introduced the problem: I was getting this crash since v0.32.0, but searching through the issues in the mpv repository, I found a post by a developer who said the libplacebo version in the official Arch Linux repositories was too old (and it was indeed), so I waited until v0.33.0 was officially released (therefore with a newer version of libplacebo), but the issue persisted.
  • Window Manager and version: mutter 3.38.2
  • GPU driver and version: NVIDIA 455.45.01 (proprietary driver)
  • Possible screenshot or video of visual glitches: https://www.youtube.com/watch?v=lG8X09lm6zE

Reproduction steps

First, let me say I'm not sure if I should've opened an issue here, or in the libplacebo repo, so feel free to close this if it's in the wrong place.

Anyway, I could only reproduce this when combining --profile=gpu-hq with --gpu-api=vulkan. To reproduce this, pick a random video (the download link for the one I used is at the end of this post), then run mpv in a terminal window, like this:

$ mpv --no-config --profile=gpu-hq --gpu-api=vulkan BigBuckBunny.mp4

After that, start resizing the mpv window diagonally (I couldn't reproduce it when resizing horizontally or vertically), as fast as you can: mpv will freeze (and sometimes your whole desktop will too, for a brief moment).

mpv will also spam this following message in the console:

[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] vk->QueueSubmit(cmd->queue, 1, &sinfo, cmd->fence): VK_ERROR_DEVICE_LOST
[vo/gpu/vulkan/libplacebo] Failed holding swapchain image for presentation
[vo/gpu] Failed presenting frame!

And If you have journalctl -f open (and if you have a NVIDIA GPU), this Xid error will show up:

dec 03 19:22:05 arch kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=83961, Ch 00000022, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x1_016a1000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE

Expected behavior

mpv to not freeze and the NVIDIA driver to not throw a Xid error.

Actual behavior

mpv freezes and NVIDIA throws a Xid error.

Log file

https://gist.githubusercontent.com/guihkx/97e8437eb059868623564f18156d667b/raw/7d51f3bed46bc72661b8c1d81c30b4ff5c518427/log.txt

Sample files

http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4

guihkx avatar Dec 03 '20 22:12 guihkx

Can confirm the issue, there is no need to resize "as fast you can" the issue can appear on any resize kinda randomly. Also, it's extremely likely that this is not mpv or libplacebo's fault but NVIDIA's bad drivers.

sfan5 avatar Dec 03 '20 22:12 sfan5

there is no need to resize "as fast you can"

Indeed! Today I was watching a video with my second screen enabled, and as soon as I clicked and dragged just a little to start resizing the window, the crash happened. It just seemed easier to reproduce by resizing it quickly, though. >:)

Can you still reproduce a crash/freeze without using --profile=gpu-hq, though? What about using the OpenGL back-end? I couldn't in both cases...

guihkx avatar Dec 03 '20 22:12 guihkx

I use gpu-hq + OpenGL daily and I've never seen a similar thing happen with that.

sfan5 avatar Dec 03 '20 22:12 sfan5

Also, it's extremely likely that this is not mpv or libplacebo's fault but NVIDIA's bad drivers.

It's also extremely likely that mpv and/or libplacebo would benefit from having better code to recover from device loss, since that can happen for entirely legitimate reasons as well.

As far as that distinction is concerned, I'm convinced libplacebo is, essentially, doing the correct thing. It's forwarding the error to vo_gpu's draw_frame as the return value of pl_swapchain_submit_frame, which the documentation states indicates some sort of severe failure (e.g. device loss).

So in terms of an actual change that needs to be made, it's most likely on the mpv side. More specifically, mpv needs some mechanism of signalling that the entire gpu context needs to be recreated (possibly as a new VOCTRL of some sort)

haasn avatar Dec 04 '20 04:12 haasn

So an update to their beta Vulkan driver was released yesterday:

Vulkan Beta Driver Release Updates January 27th, 2021 - Windows 457.88, Linux 455.50.04

  • Fixes:
    • Fixed a bug in a stencil-buffer optimization that could occasionally result in VK_ERROR_DEVICE_LOST

Before I waste my time downloading, packaging and installing this beta driver, do you guys think that fix could be related to this bug?

Cheers.

guihkx avatar Jan 28 '21 06:01 guihkx

Unlikely. We don't use stencil buffers. (But maybe they do, internally. Or maybe the bug affects more than just stencil buffers.)

haasn avatar Jan 28 '21 11:01 haasn

Yep, it's definitely not fixed. Attaching a new log file nonetheless, since mpv's output appears to have changed slightly:

https://gist.github.com/guihkx/d32c0eeaf782ae2f0140de4c87e875a9/raw/5d33de105a7b141e9b071134bcb3e3dd84b0ba01/log.txt

NVIDIA driver 455.50.04 mpv 0.33.0 libplacebo 3.104.0

guihkx avatar Jan 28 '21 14:01 guihkx

I can confirm I also have the same issue, also not just resizing the window but also going fullscreen or exiting fullscreen causes the error very frequently.

I have noticed that if I pause the video prior to going fullscreen it seems to practically never happen. If I pause the video prior to resizing, the crash is softer (e.g. MPV seems to crash/display black screen but the whole system doesn't freeze up anymore, and mpv doesn't hang anymore either)

I also noticed that if I change the video sync method from display-resample to audio, it significantly alleviates the issue but doesn't entirely eliminate it, but at least if it's just going in and out of fullscreen, it doesn't really happen anymore. Sad part is that this means no interpolation.

Has anybody reported the issue to nvidia?

Rabcor avatar Jan 30 '21 01:01 Rabcor

Issue still the same almost a year later.

I'm on a laptop, and all the issues I had originally were when using PRIME Offloading

But now I have set things up so that my dGPU is the default GPU, so no offloading anymore. This solved a bunch of other issues I had (like games hard crashing my system when vsync was on)

But this particular issue actually got worse (happens more frequently) under this setup.

Also found this almost certainly related bug report thread on nvidia's forums: https://forums.developer.nvidia.com/t/vk-error-device-lost-in-many-game-titles/164513

Also mentioned in that thread is a workaround used by DXVK to prevent said issue: https://github.com/doitsujin/dxvk/commit/16a51f3c03d5bc52ba67101fb5a5cd5b8d96fa94

I would make a test build of mpv and see if this fixes it here too or not but I'm not much of a developer so I'd probably struggle for days to get it done D:

Rabcor avatar Sep 15 '21 02:09 Rabcor

Issue still the same almost a year later.

It might be a good idea to re-investigate whether the validation layers pick up anything that could explain this. Make sure you have the vulkan validation layers installed and then run mpv using --gpu-debug.

haasn avatar Sep 15 '21 06:09 haasn

Issue still the same almost a year later.

It might be a good idea to re-investigate whether the validation layers pick up anything that could explain this. Make sure you have the vulkan validation layers installed and then run mpv using --gpu-debug.

here's a gist: https://gist.github.com/Rabcor/1e58aa97be8545ff387c10d8bb3d65ba

Btw for me this is happening with the anime4k shaders, but not some other shaders.

I think the reason it isn't happening on GL might have something to do with the shaders not loading correctly on openGL in the first place (I mean mpv seems to load them but they're not actually doing anything).

Rabcor avatar Sep 15 '21 23:09 Rabcor

Btw I did find a workaround of sorts...

If I disable the shaders before resizing the window the issue naturally won't happen, so I just added this to input.conf

CTRL+0 no-osd change-list glsl-shaders clr ""; show-text "GLSL shaders cleared"
f no-osd change-list glsl-shaders clr ""; show-text ""; show-text "GLSL shaders cleared!"; cycle fullscreen
MBTN_LEFT_DBL no-osd change-list glsl-shaders clr ""; show-text ""; show-text "GLSL shaders cleared!"; cycle fullscreen

Where I can hit CTRL+0 for resizing the window, and setting to fullscreen will never crash mpv anymore. The extra show-text is there just to add a tiny extra delay just in case.

To reapply the shader afterwards I can use something like

CTRL+u apply-profile Upscaling; show-text "Profile Applied: Upscaling"

or

CTRL+1 no-osd change-list glsl-shaders set "~~/shaders/Anime4K_Clamp_Highlights.glsl;~~/shaders/Anime4K_Restore_CNN_Light_VL.glsl;~~/shaders/Anime4K_Upscale_CNN_x2_L.glsl"; show-text "Anime4K: Modern 720p->1080p (HQ)"

A more ideal way would be for it to reload the config after going fullscreen, but I couldn't find a way to do tha through input.conf.

Also a note I thought I'd make:

The issue does not occur on Windows with the same config, only Linux, implying that as said before, yes, this is probably an issue in the nvidia driver.

Rabcor avatar Sep 16 '21 18:09 Rabcor

Looks like a GPU hang.

haasn avatar Sep 16 '21 18:09 haasn

might be a default settings mismatch, i think i might have managed to fix it with this setting (dunno which one is better)

vulkan-queue-count=1 #1 is default
swapchain-depth=1 #3 is default

seems like swapchain depth is reliant on queue-count ?

ashtonx avatar Sep 24 '21 19:09 ashtonx

might be a default settings mismatch, i think i might have managed to fix it with this setting (dunno which one is better)

vulkan-queue-count=1 #1 is default
swapchain-depth=1 #3 is default

seems like swapchain depth is reliant on queue-count ?

Ah, holy shit, this works!

Also I tried some variants like

vulkan-queue-count=3
swapchain-depth=3

And

vulkan-queue-count=3
swapchain-depth=1

But only setting both to 1 actually seemed to solve the issue, no other variation on the settings I tested seemed to work.

Rabcor avatar Oct 01 '21 19:10 Rabcor

I believe queue count is gpu dependant, no idea how to check how many gpu can handle so I defaulted to 1. no idea how to check how many my gpu can handle but gpu might be the reason other settings bork out.

edit: went and tested, other settings as well.. borks out as well, 1 works without issue, ~~might be async being borked?~~ anyway running nvidia gtx 970, drivers: nvidia 470.74-3, kde (disabling compositor). might be nvidia issue, unless someone with similar settings different gpu runs into same problem.

edit2: also aside from resizing, issues shows up when leaving full screen, though that's due to resize, quite often though at random.

edit3: something is about 1 and 1. I assumed queue count/swapchain is settings for vulan-async-computing/transfer tried disabling it while leaving value other than 1:1 still breaks..

tl;dr no idea why but needs to be queue count 1, swapchain depth 1. Anything else i tried borks out.

ashtonx avatar Oct 01 '21 21:10 ashtonx

Can also confirm that setting --vulkan-queue-count=1 and --swapchain-depth=1 fixes the crash for me with my GTX 660 + 470.74 drivers. Full command I used to reproduce:

mpv --no-config --profile=gpu-hq --gpu-api=vulkan --vulkan-queue-count=1 --swapchain-depth=1 http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4

I tried to reproduce this on an Radeon R9 280x with amdgpu drivers and it doesn't crash there with the default values.

guihkx avatar Oct 02 '21 04:10 guihkx

I found out that (perhaps unsurprisingly) that disabling the swapchain logic (setting it to 1 disables it, technically) somewhat dramatically affects performance, particularly for interpolation. My vsync jitter becomes about 10x worse according to statistics, compared to when it is set to 3.

I figured out a way to sorta combat it.

You cannot change the swapchain-depth post launch, you cannot change the rendering API either, after you have launched mpv with the swapchain depth set to 1 there's literally nothing you can do ot set it to 3 again besides turning it off and launching it again with swapchain depth set to 3 from the start.

My workaround for this is to set swapchain depth to 1 when in windowed mode, otherwise letting it remain at it's default (3), and just configuring mpv to always launch in fullscreen and disable the keybinds to exit fullscreen; like this:

in mpv.conf:

fullscreen=yes            
input-conf= ~/.config/mpv/nofscycle.conf

#Other settings...

[Linux-Windowed]
profile-cond=package.config:sub(1,1) == '/' and not fullscreen
swapchain-depth=1                                            # Bypass issue with nvidia driver hangs on window resize, this setting reduces performance. Launch with --fullscreen argument or set fullscreen=yes to avoid.

nofscycle.conf:

MBTN_LEFT_DBL ignore # Disable cycling between fullscreen & windowed
f ignore # Disable cycling between fullscreen & windowed
ESC quit # change esc from exit fullscreen to quit mpv

With this setup, I can either launch mpv with mpv --fullscreen=no --input-conf="" if I want to run mpv in a resizable window, or I can comment out those first two lines I put in mpv.conf.

It's not ideal, but since I mostly watch things in fullscreen anyways, I am more or less getting the best of both worlds here.

Also for people who mostly use windowed it's easy to flip the logic, just remove the first two lines I put in mpv.conf and only use the profile, then launch mpv --fullscreen=yes --input-conf=~/.config/mpv/nofscycle.conf when you want to go fullscreen.

Rabcor avatar Oct 18 '21 15:10 Rabcor

I've encountered this issue on manjaro with 470 and 495 dirvers. Tested it now, after 510.47.03 release with this command:

mpv --no-config --profile=gpu-hq --gpu-api=vulkan http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/BigBuckBunny.mp4

And the issue seems to be resolved.

Gigas002 avatar Feb 02 '22 12:02 Gigas002

If this is indeed fixed on R510, hopefully it gets backported to R470 because it's the last driver branch that will support my GPU. :c

guihkx avatar Feb 02 '22 21:02 guihkx

I can still reproduce this in 510.47.03. It seems a lot less likely to happen, but still possible.

dmesg message:

[Feb 6 23:49] NVRM: Xid (PCI:0000:2d:00): 31, pid=30643, Ch 00000034, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x0_1a204000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ

Log with --gpu-debug: https://fars.ee/YqBG

OceanS2000 avatar Feb 06 '22 15:02 OceanS2000

This isn't really resolved, I can second OceanS2000 on that, it seems a lot less likely to happen but it still does on 510 drivers when there's no config.

But when i'm using certain aforementioned shaders it's just as bad as before. Honestly i think i already mentioned earlier that it rarely happens when not using shaders.

Rabcor avatar Mar 10 '22 20:03 Rabcor

Is there any way to solve this problem temporarily?

Elhorses avatar Jun 17 '22 09:06 Elhorses

Is there any way to solve this problem temporarily?

It could probably be worked around by forcing all resizes to be synchronous. I guess I could add a debug flag for that.

haasn avatar Jun 17 '22 13:06 haasn

Is there any way to solve this problem temporarily?

It could probably be worked around by forcing all resizes to be synchronous. I guess I could add a debug flag for that.

I'll try, thank you very much!

Elhorses avatar Jun 19 '22 05:06 Elhorses

Can you guys retest this? I can't seem to reproduce this anymore, which is great, especially because my GPU will only be getting limited drivers updates...

  • OS: Arch Linux
  • mpv: 0.34.1
  • GPU: GTX 660
  • Drivers: 470.141.03
$ mpv --help
mpv 0.34.1-dirty Copyright © 2000-2021 mpv/MPlayer/mplayer2 projects
 built on UNKNOWN
FFmpeg library versions:
   libavutil       57.17.100
   libavcodec      59.18.100
   libavformat     59.16.100
   libswscale      6.4.100
   libavfilter     8.24.100
   libswresample   4.3.100
FFmpeg version: n5.0.1

guihkx avatar Aug 13 '22 04:08 guihkx

Windows 10 here. Mpv kept crashing and eventually froze my entire OS when I attempted to resize a window. I jumped from 0.34.0 to 0.35.0-15 after that, and resizing works more reliably now, but mpv can still crash... at least it doesn't seem to freeze my system anymore.

BlizzBlu avatar Nov 26 '22 17:11 BlizzBlu

Can you guys retest this? I can't seem to reproduce this anymore, which is great, especially because my GPU will only be getting limited drivers updates...

  • OS: Arch Linux
  • mpv: 0.34.1
  • GPU: GTX 660
  • Drivers: 470.141.03
$ mpv --help
mpv 0.34.1-dirty Copyright © 2000-2021 mpv/MPlayer/mplayer2 projects
 built on UNKNOWN
FFmpeg library versions:
   libavutil       57.17.100
   libavcodec      59.18.100
   libavformat     59.16.100
   libswscale      6.4.100
   libavfilter     8.24.100
   libswresample   4.3.100
FFmpeg version: n5.0.1

I can no longer reproduce this issue either.

$ mpv --version
mpv 0.35.0 Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
 built on UNKNOWN

We should probably close this thread, I don't know how or when but it seems like it's been resolved. I'm guessing it was an nvidia driver update that resolved it rather than anything MPV did.

$ nvidia-smi
Sun Jan  1 15:43:48 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.11    Driver Version: 525.60.11    CUDA Version: 12.0     |

I still ahd issue in drivers 510, so something between 510 and 525 must have solved it.

Rabcor avatar Jan 01 '23 15:01 Rabcor

It still happens for me. Using git build of mpv and 525.60.11 nvidia driver.

$ mpv --version
mpv 0.35.0-67-gad65c8855b Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
 built on Sat Dec 31 02:01:04 2022
libplacebo version: v5.229.1-49-g22ce304
FFmpeg version: n5.1.2
FFmpeg library versions:
   libavutil       57.28.100
   libavcodec      59.37.100
   libavformat     59.27.100
   libswscale      6.7.100
   libavfilter     8.44.100
   libswresample   4.7.100

Gigas002 avatar Jan 01 '23 15:01 Gigas002

for me, above suggestions did not fix it (still having mpv hang on window resize with error [vo/gpu-next/libplacebo] vkQueueSubmit: VK_ERROR_DEVICE_LOST (../src/vulkan/command.c:390)):

vulkan-queue-count=1
swapchain-depth=1

nvidia driver version:

Fri Jan  6 20:17:35 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+

mpv version:

mpv 0.35.0 Copyright © 2000-2022 mpv/MPlayer/mplayer2 projects
 built on Sat Nov 12 15:31:46 UTC 2022
FFmpeg library versions:
   libavutil       57.28.100
   libavcodec      59.37.100
   libavformat     59.27.100
   libswscale      6.7.100
   libavfilter     8.44.100
   libswresample   4.7.100
FFmpeg version: 5.1.2

Seems this error is not limited to nVidia cards: https://github.com/mpv-player/mpv/issues/10425

SjoerdV avatar Jan 06 '23 19:01 SjoerdV