optimus-manager
optimus-manager copied to clipboard
Switching to NVIDIA GPU does not work after suspend
Describe the bug
I recently did a reinstall of my system and now I can't get switching to the NVIDIA GPU to work reliably anymore. It works just fine after rebooting, but once I suspend the system while in Intel mode, the NVIDIA GPU won't turn on again when trying to switch to it. I had been using switching=bbswitch
just fine for ~2 years, since this was the only configuration that was working with my laptop model.
Unfortunately I can't really tell if this started with the reinstall itself, a newer version of optimus-manager, a NVIDIA driver update or something else. I just noticed it after the reinstall while testing if optimus-manager works. I also upgraded the BIOS firmware recently, but downgrading it again did not resolve the issue, so I'd rule that out.
System info Laptop Model: Dell XPS 15 9560 OS: Arch Linux Kernel: 5.7.7-arch1-1 KDE Plasma: 5.19.3 SDDM: 0.18.1 bspwm: 0.9.9 optimus-manager: 1.3.1
optimus-manager.conf
:
[optimus]
switching=bbswitch
pci_power_control=no
pci_remove=no
pci_reset=no
Only had the first two lines in there before the reinstall, but different values for the settings below do not make a difference. I also tried switching=acpi_call
but that did not help either.
Logs
optimus-manager --status
output:
ERROR: the latest GPU setup attempt failed at Xorg pre-start hook.
Log at /var/log/optimus-manager/switch/switch-20200712T134536.log
Cannot execute command because of previous errors.
/var/log/optimus-manager/switch/switch-20200712T134536.log
:
[17] INFO: # Xorg pre-start hook
[17] INFO: Previous state was: {'type': 'pending_pre_xorg_start', 'requested_mode': 'nvidia', 'current_mode': 'intel'}
[17] INFO: Requested mode is: nvidia
[17] INFO: Checking for GDM display servers
[796] INFO: Available modules: ['nouveau', 'bbswitch', 'nvidia', 'nvidia_drm', 'nvidia_modeset', 'nvidia_uvm']
[796] INFO: Unloading modules ['nouveau'] (if loaded)
[799] INFO: Loading module bbswitch
[801] INFO: Setting GPU power to ON via bbswitch
[829] INFO: Loading module nvidia
[1557] ERROR: Xorg pre-start setup error
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/optimus_manager/bash.py", line 11, in exec_bash
out = subprocess.check_output(
File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '-c', 'modprobe nvidia NVreg_UsePageAttributeTable=1']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 179, in _load_module
exec_bash("modprobe %s %s" % (module, options))
File "/usr/lib/python3.8/site-packages/optimus_manager/bash.py", line 18, in exec_bash
raise BashError(
optimus_manager.bash.BashError: Failed to execute 'modprobe nvidia NVreg_UsePageAttributeTable=1' :
modprobe: ERROR: could not insert 'nvidia': No such device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/optimus_manager/hooks/pre_xorg_start.py", line 45, in main
setup_kernel_state(config, prev_state, requested_mode)
File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 21, in setup_kernel_state
_nvidia_up(config)
File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 56, in _nvidia_up
_load_nvidia_modules(config, available_modules)
File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 109, in _load_nvidia_modules
_load_module(available_modules, "nvidia", options="NVreg_UsePageAttributeTable=%d" % pat_value)
File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 181, in _load_module
raise KernelSetupError("error running modprobe for %s : %s" % (module, str(e)))
optimus_manager.kernel.KernelSetupError: error running modprobe for nvidia : Failed to execute 'modprobe nvidia NVreg_UsePageAttributeTable=1' :
modprobe: ERROR: could not insert 'nvidia': No such device
[1557] INFO: Removing /etc/X11/xorg.conf.d/10-optimus-manager.conf (if present)
[1558] INFO: Writing state {'type': 'pre_xorg_start_failed', 'switch_id': '20200712T134536', 'requested_mode': 'nvidia'}
Part of dmesg
output:
[ 248.049467] bbswitch: enabling discrete graphics
[ 248.049482] pci 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 248.051760] pci 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 248.674732] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 248.675238] NVRM: This is a 64-bit BAR mapped above 4GB by the system
NVRM: BIOS or the Linux kernel, but the PCI bridge
NVRM: immediately upstream of this GPU does not define
NVRM: a matching prefetchable memory window.
[ 248.675239] NVRM: This may be due to a known Linux kernel bug. Please
NVRM: see the README section on 64-bit BARs for additional
NVRM: information.
[ 248.675242] nvidia: probe of 0000:01:00.0 failed with error -1
[ 248.675265] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 248.675265] NVRM: None of the NVIDIA devices were initialized.
[ 248.675729] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
Have you tried using pci_reset=hot_reset
?
https://github.com/Askannz/optimus-manager/wiki/A-guide--to-power-management-options
Have you tried using
pci_reset=hot_reset
? https://github.com/Askannz/optimus-manager/wiki/A-guide--to-power-management-options
I tried that with both switching=bbswitch
and switching=acpi_call
but still get the same error.
I also downgraded optimus-manager to 1.2 because I know that this version definitely worked before. As this didn't work I downgraded the Kernel, NVIDIA driver and bbswitch (I just picked the last versions that were released in 2019, because I'm not very experienced with downgrading packages) and now the switching works again even after suspending the system (with optimus-manager 1.3.1).
So in the end my problem is caused by some changes to the Kernel or the NVIDIA driver. I'll try to investigate further, but I'm also not sure if this is something that optimus-manager will have to deal with? I'd just leave this issue open for now.
#291 I've got a similar issue, though, not quite the same. Do our logs match?
#291 I've got a similar issue, though, not quite the same. Do our logs match?
I don't think this is a similar issue. For me modprobe nvidia
fails after suspending the system (see logs above).
#291 I've got a similar issue, though, not quite the same. Do our logs match?
I don't think this is a similar issue. For me
modprobe nvidia
fails after suspending the system (see logs above).
I see, sorry for misunderstanding. I'll try to look into it if no one solves these in about a week or so, when I will have time. However, I suspect kernel or driver issue definitely.
After trying different Kernel / NVIDIA driver versions, it seems that the issue was introduced with Kernel version 5.7. Kernel version 5.6 with nvidia-dkms 450.57 is working without any problems. This was also reported here.
I'll stick to Kernel version 5.6 for now, but I'm unsure on what to do next? Should I file a bug report on the Kernel bug tracker?
There's probably not much I can do from the optimus-manager side, but one thing you could try is powering up the card just before suspend (I suspect it doesn't like going to sleep while being turned off by bbswitch). To do that, run echo ON | sudo tee /proc/acpi/bbswitch
before initiating suspend.
I can confirm that this approach works. I have put this into the systemd-sleep pre/post file and thus now it is automatized.
cat /usr/lib/systemd/system-sleep/nvidia_suspend.sh
#!/bin/bash
#
if [ "${1}" = "pre" ]; then
# Do the thing you want before suspend here
echo ON | sudo tee /proc/acpi/bbswitch
elif [ "${1}" = "post" ]; then
# Do the thing you want after resume here
echo OFF | sudo tee /proc/acpi/bbswitch
fi
There's probably not much I can do from the optimus-manager side, but one thing you could try is powering up the card just before suspend (I suspect it doesn't like going to sleep while being turned off by bbswitch). To do that, run
echo ON | sudo tee /proc/acpi/bbswitch
before initiating suspend.
I somehow missed your suggestion to try powering up the card before suspend, but after @rocky7x's comment I can also confirm that this works. I've set up the systemd-sleep script as well, so now I don't have to think about this anymore and I can use the current kernel again without problems.
Thanks to all for the help!
I seem to have the same issue (at least according to the logs), but with a different trigger: for me, it happens if I start my system off battery, in Intel mode, then try to switch to Nvidia mode. There is no such issue when starting with the power cable plugged in.
It's weird, to say the least.
I'll be trying the scripts mentioned at https://wiki.archlinux.org/index.php/Talk:Dell_XPS_15_9570 (I have a similar type of Dell laptop) today, maybe it'll help a bit. I'll report back if it fixes anything.
I integrated the systemd-sleep script solution to optimus-manager, let me know if it fixes your issue (It also fixed a bunch of issues related to the Nvidia GPU waking up by itself after suspend, which is nice !).
Thank you, @Askannz! Now it works again without the systemd-sleep script.
I integrated the systemd-sleep script solution to optimus-manager, let me know if it fixes your issue (It also fixed a bunch of issues related to the Nvidia GPU waking up by itself after suspend, which is nice !).
I am experiencing the same issue right now. It was working fine after the first install but it doesn't work anymore. Any suggestions?
Sup @beyond9thousand and @Dzjed
Is this still reproducible on a modern distribution with updated linux kernel, nvidia driver version and using the optimus-manager-git
package? The project is facing a revamp so, please stick with the -git
version of this package.
Let me know if this issue is still relevant to you ok?
This could have been solved recently by the driver because as the output states, nvidia
module is failing specifically with NVreg_UsePageAttributeTable=1
option being passed during the module load.
Traceback (most recent call last):
File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 179, in _load_module
exec_bash("modprobe %s %s" % (module, options))
File "/usr/lib/python3.8/site-packages/optimus_manager/bash.py", line 18, in exec_bash
raise BashError(
optimus_manager.bash.BashError: Failed to execute 'modprobe nvidia NVreg_UsePageAttributeTable=1' :
modprobe: ERROR: could not insert 'nvidia': No such device
Have a nice week.