Video decode does not work after exiting sleep.
Firefox 98.0a1 (2022-01-23) Nvidia drivers: 510.39.01
Video decode works properly after boot, but fails with the following log after entering sleep and exiting it
libva info: VA-API version 1.13.0 libva info: User environment variable requested driver 'nvidia' libva info: Trying to open /usr/lib64/va/drivers/nvidia_drv_video.so [329498-329517] ../nvidia-vaapi-driver-0.0.3/src/vabackend.c: 84 init cuda error 'unknown error' (999)
Seems to be a memory leak issue from Nvidia: https://gitlab.gnome.org/GNOME/mutter/-/issues/2045
Driver 495 has this issue as well as another issue with hardware cursor.
A 999 error is pretty generic. Can you see if there's anything in dmesg? Another tip is to try running nvidia-modprobe, though I doubt that'll help in this case.
I'm not sure that mutter issue is related.
Here are the messages from dmesg tha may be relevant:
[ 81.641775] NVRM: GPU at PCI:0000:01:00: GPU-267f0caa-10d4-f044-5b1a-6a5470fcfadb [ 81.641779] NVRM: Xid (PCI:0000:01:00): 31, pid=1081, Ch 00000002, intr 10000000. MMU Fault: ENGINE HOST6 HUBCLIENT_HOST faulted @ 0x1_01010000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
I've managed to reproduce this locally, with exactly the same error in dmesg.
There's a post on the NVIDIA forum with this issue, that I've replied to.
In that post there was a work-around, that did work for me: rmmod nvidia_uvm; modprobe nvidia_uvm.
@elFarto That workaround fixed it for me as well, and actually it helped resolve some issues I was experiencing with hardware acceleration in general not working after suspend, in things like Electron apps.
Same here, after switching TTY and running rmmod nvidia-uvm -f && modprobe nvidia_uvm, graphical issues are resolved. How annoying.
You could try the systemd services from the nvidia driver. They helped me resolve a similar issue before.
@danyeet, that worked. I'd used them before, but ran into issues on a previous update and disabled them. Now enabled nvidia-suspend.service and suspend is working as intended.
nvidia-suspend.service has been enabled for me, and the issue is still persistent, unfortunately.
Info: Ubuntu 21.10, driver 510, 1080 Ti
In that post there was a work-around, that did work for me:
rmmod nvidia_uvm; modprobe nvidia_uvm.
This workaround works for me, except that if Firefox is running, nvidia_uvm module is shown as being used (even without any video being played at the moment). Only after I finish Firefox I'm able to rmmod the nvidia_uvm.
sadly, enabling nvidia-suspend.service didn't seem to help :(
I've hard partial success with enabling nvidia-suspend.service on my laptop with disabled hybrid graphics by the manufacturer (nvidia only) on X11, but there was a 90% chance Firefox would crash on resume, I even tried it on a clean profile to make sure I didn't mess something up on about:config.
I also tried enabling nvidia-hibernate-service and nvidia-resume.service after reading this, without much luck, but at least hibernating had a higher chance of Firefox not crashing. If it did, as long as nothing else was using nvidia-uvm (usually Discord, steam and mullvad-gui), I didn't need the reload the module, but otherwise I had to manually close them and reload.
What did the trick for me was adding nvidia.NVreg_PreserveVideoMemoryAllocations=1 to my kernel parameters. So far sleep and resume have been working like a charm, no crashes, and hardware decoding works right after resume. Some users might also need to add nvidia.NVreg_TemporaryFilePath=/path/to/desired/folder according to the arch wiki as it uses /tmp by default, and depending on the amount of memory being dumped there it might not be enough.
Since adding this kernel parameter, do you still have any of the nvidia-*.services running? And if so, which ones?
On Fri, 3 Jun 2022, 05:12 tchofy, @.***> wrote:
I've hard partial success with enabling nvidia-suspend.service on my laptop with disabled hybrid graphics by the manufacturer (nvidia only) on X11, but there was a 90% chance Firefox would crash on resume, I even tried it on a clean profile to make sure I didn't mess something up on about:config.
I also tried enabling nvidia-hibernate-service and nvidia-resume.service after reading this https://download.nvidia.com/XFree86/Linux-x86_64/460.84/README/powermanagement.html, without much luck, but at least hibernating had a higher chance of Firefox not crashing. If it did, as long as nothing else was using nvidia-uvm (usually Discord, steam and mullvad-gui), I didn't need the reload the module, but otherwise I had to manually close them and reload.
What did the trick for me was adding nvidia.NVreg_PreserveVideoMemoryAllocations=1 to my kernel parameters. So far sleep and resume have been working like a charm, no crashes, and hardware decoding works right after resume. Some users might also need to add nvidia.NVreg_TemporaryFilePath=/path/to/desired/folder according to the arch wiki https://wiki.archlinux.org/title/NVIDIA/Troubleshooting#Screen_corruption_after_resuming_from_suspend_or_hibernation as it uses /tmp by default, and depending on the amount of memory being dumped there it might not be enough.
— Reply to this email directly, view it on GitHub https://github.com/elFarto/nvidia-vaapi-driver/issues/42#issuecomment-1145567289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELHKGRVC5BCRLZWBFC7B4TVNGAZRANCNFSM5MVF6D7Q . You are receiving this because you commented.Message ID: @.***>
Still have all 3 enabled, suspend, hibernate, and resume. I'll do a quick test to see if decoding breaks again if I disable them and edit my post.
Edit: I did some tests, and got some weird results, I'll keep updating this if I get to any conclusions
- nvidia-* services disabled, kernel parameter added: decoding works, but attempting to sleep or hibernate would cause the screen to go off and come back on a few seconds later, failing to boot/hibernate
- nvidia-resume enabled, kernel parameter removed: decoding works and no firefox crash after sleep/hibernate, unexpected
- all nvidia-* services enabled, kernel parameter removed: decoding works and no crash Notably, I've had some kwin bugs come back after removing the kernel parameter, which would completely garble all the information on the screen until kwin was restarted or compositing was manually disabled.
Since the core of the issue is a memory leak on nvidia's side, and I did a fresh reboot before the tests, it might take some time for the crash to happen. I'll keep the services enabled for now and the kernel parameter disabled, and try to observe any results.
Other relevant info that might help
- Running on the
linux-zen 5.18.1 - My card is a GTX 1650 running
nvidia-dkms 515.43.04drivers, with no igpu, as stated before - I have nvidia modules on early kms
nvidia-drm.modeset=1kernel parameter- ibt=off parameter due to a compatibility issue with the 5.18 kernel
Thanks for this detailed response! I've enabled all 3 of the nvidia services (nvidia-hibernate.service, nvidia-suspend.service and nvidia-resume.service) and added the kernel parameter suggested (nvidia.NVreg_PreserveVideoMemoryAllocations=1). I'll let you know how everything goes over the next few days. I have 32gb of ram so I shouldn't need the nvidia.NVreg_TemporaryFilePath parameter right now. Perhaps this should be added to the README.md?
Another update, after a few sleep/resume cycles with all 3 services on, and parameter off, I noticed decoding falls back to software decoding (cpu), but firefox doesn't crash like it did before for me. Still, seems like nvidia.NVreg_PreserveVideoMemoryAllocations=1 parameter was the solution to my specific case, no matter how long the laptop sleeps, when it wakes up hw decoding will work as if it was a fresh boot.
I have 32gb of ram so I shouldn't need the
nvidia.NVreg_TemporaryFilePathparameter right now.
I'm not sure what's the default behavior without the parameter (dumps to ram? keeps on vram? completely frees vram and it has to pull everything back up on resume?), but with it, the card's vram is dumped to your disk/ssd on suspend/hibernate. By default it uses your /tmp directory in the root of your system. Since /tmp is a tmpfs partition, it can run out of space depending on how much vram you have. I don't know if it's like that for every distro, but for me /tmp has 7.8gb of space. Should be more than enough for my 4gb vram card, but higher end models with 8gb of vram or more will definitely need to map it to somewhere else to be safe, like /var/tmp to avoid system lockups.
was the solution to my specific case
Same here! After my personal testing, the 3 services + the command line option seems to be giving me a 100% success rate so far when waking from suspend.
but for me /tmp has 7.8gb of space. Should be more than enough for my 4gb vram card, but higher end models with 8gb of vram or more will definitely need to map it to somewhere else to be safe, like /var/tmp to avoid system lockups.
just checked, for me it has about 16gb. This should be enough for my 3070 8gb model, but if I experience any problems later down the road, I'll be sure to try changing the tmp path.
I realize this is an older thread, but I have just enabled the two services (nvidia-hibernate.service and nvidia-suspend.service) and set the nvidia.NVreg_PreserveVideoMemoryAllocations=1 module parameter and am now able to preserve hardware decoding (firefox, mpv with hwdec=auto) and no longer throw error with vainfo after suspend/resume. Will update if I continue to experience issues down the line, but so far so good.