optimus-manager icon indicating copy to clipboard operation
optimus-manager copied to clipboard

Switching to NVIDIA GPU does not work after suspend

Open Dzjed opened this issue 4 years ago • 13 comments

Describe the bug I recently did a reinstall of my system and now I can't get switching to the NVIDIA GPU to work reliably anymore. It works just fine after rebooting, but once I suspend the system while in Intel mode, the NVIDIA GPU won't turn on again when trying to switch to it. I had been using switching=bbswitch just fine for ~2 years, since this was the only configuration that was working with my laptop model.

Unfortunately I can't really tell if this started with the reinstall itself, a newer version of optimus-manager, a NVIDIA driver update or something else. I just noticed it after the reinstall while testing if optimus-manager works. I also upgraded the BIOS firmware recently, but downgrading it again did not resolve the issue, so I'd rule that out.

System info Laptop Model: Dell XPS 15 9560 OS: Arch Linux Kernel: 5.7.7-arch1-1 KDE Plasma: 5.19.3 SDDM: 0.18.1 bspwm: 0.9.9 optimus-manager: 1.3.1

optimus-manager.conf:

[optimus]
switching=bbswitch
pci_power_control=no
pci_remove=no
pci_reset=no

Only had the first two lines in there before the reinstall, but different values for the settings below do not make a difference. I also tried switching=acpi_call but that did not help either.

Logs optimus-manager --status output:

ERROR: the latest GPU setup attempt failed at Xorg pre-start hook.
Log at /var/log/optimus-manager/switch/switch-20200712T134536.log

Cannot execute command because of previous errors.

/var/log/optimus-manager/switch/switch-20200712T134536.log:

[17] INFO: # Xorg pre-start hook
[17] INFO: Previous state was: {'type': 'pending_pre_xorg_start', 'requested_mode': 'nvidia', 'current_mode': 'intel'}
[17] INFO: Requested mode is: nvidia
[17] INFO: Checking for GDM display servers
[796] INFO: Available modules: ['nouveau', 'bbswitch', 'nvidia', 'nvidia_drm', 'nvidia_modeset', 'nvidia_uvm']
[796] INFO: Unloading modules ['nouveau'] (if loaded)
[799] INFO: Loading module bbswitch
[801] INFO: Setting GPU power to ON via bbswitch
[829] INFO: Loading module nvidia
[1557] ERROR: Xorg pre-start setup error
Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/optimus_manager/bash.py", line 11, in exec_bash
    out = subprocess.check_output(
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bash', '-c', 'modprobe nvidia NVreg_UsePageAttributeTable=1']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 179, in _load_module
    exec_bash("modprobe %s %s" % (module, options))
  File "/usr/lib/python3.8/site-packages/optimus_manager/bash.py", line 18, in exec_bash
    raise BashError(
optimus_manager.bash.BashError: Failed to execute 'modprobe nvidia NVreg_UsePageAttributeTable=1' :
modprobe: ERROR: could not insert 'nvidia': No such device


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/optimus_manager/hooks/pre_xorg_start.py", line 45, in main
    setup_kernel_state(config, prev_state, requested_mode)
  File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 21, in setup_kernel_state
    _nvidia_up(config)
  File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 56, in _nvidia_up
    _load_nvidia_modules(config, available_modules)
  File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 109, in _load_nvidia_modules
    _load_module(available_modules, "nvidia", options="NVreg_UsePageAttributeTable=%d" % pat_value)
  File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 181, in _load_module
    raise KernelSetupError("error running modprobe for %s : %s" % (module, str(e)))
optimus_manager.kernel.KernelSetupError: error running modprobe for nvidia : Failed to execute 'modprobe nvidia NVreg_UsePageAttributeTable=1' :
modprobe: ERROR: could not insert 'nvidia': No such device

[1557] INFO: Removing /etc/X11/xorg.conf.d/10-optimus-manager.conf (if present)
[1558] INFO: Writing state {'type': 'pre_xorg_start_failed', 'switch_id': '20200712T134536', 'requested_mode': 'nvidia'}

Part of dmesg output:

[  248.049467] bbswitch: enabling discrete graphics
[  248.049482] pci 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  248.051760] pci 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  248.674732] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[  248.675238] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.
[  248.675239] NVRM: This may be due to a known Linux kernel bug.  Please
               NVRM: see the README section on 64-bit BARs for additional
               NVRM: information.
[  248.675242] nvidia: probe of 0000:01:00.0 failed with error -1
[  248.675265] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  248.675265] NVRM: None of the NVIDIA devices were initialized.
[  248.675729] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236

Dzjed avatar Jul 12 '20 12:07 Dzjed

Have you tried using pci_reset=hot_reset? https://github.com/Askannz/optimus-manager/wiki/A-guide--to-power-management-options

Ann1kaB avatar Jul 13 '20 23:07 Ann1kaB

Have you tried using pci_reset=hot_reset? https://github.com/Askannz/optimus-manager/wiki/A-guide--to-power-management-options

I tried that with both switching=bbswitch and switching=acpi_call but still get the same error.

I also downgraded optimus-manager to 1.2 because I know that this version definitely worked before. As this didn't work I downgraded the Kernel, NVIDIA driver and bbswitch (I just picked the last versions that were released in 2019, because I'm not very experienced with downgrading packages) and now the switching works again even after suspending the system (with optimus-manager 1.3.1).

So in the end my problem is caused by some changes to the Kernel or the NVIDIA driver. I'll try to investigate further, but I'm also not sure if this is something that optimus-manager will have to deal with? I'd just leave this issue open for now.

Dzjed avatar Jul 14 '20 10:07 Dzjed

#291 I've got a similar issue, though, not quite the same. Do our logs match?

Alarg avatar Jul 14 '20 10:07 Alarg

#291 I've got a similar issue, though, not quite the same. Do our logs match?

I don't think this is a similar issue. For me modprobe nvidia fails after suspending the system (see logs above).

Dzjed avatar Jul 14 '20 12:07 Dzjed

#291 I've got a similar issue, though, not quite the same. Do our logs match?

I don't think this is a similar issue. For me modprobe nvidia fails after suspending the system (see logs above).

I see, sorry for misunderstanding. I'll try to look into it if no one solves these in about a week or so, when I will have time. However, I suspect kernel or driver issue definitely.

Alarg avatar Jul 14 '20 12:07 Alarg

After trying different Kernel / NVIDIA driver versions, it seems that the issue was introduced with Kernel version 5.7. Kernel version 5.6 with nvidia-dkms 450.57 is working without any problems. This was also reported here.

I'll stick to Kernel version 5.6 for now, but I'm unsure on what to do next? Should I file a bug report on the Kernel bug tracker?

Dzjed avatar Jul 16 '20 11:07 Dzjed

There's probably not much I can do from the optimus-manager side, but one thing you could try is powering up the card just before suspend (I suspect it doesn't like going to sleep while being turned off by bbswitch). To do that, run echo ON | sudo tee /proc/acpi/bbswitch before initiating suspend.

Askannz avatar Jul 25 '20 07:07 Askannz

I can confirm that this approach works. I have put this into the systemd-sleep pre/post file and thus now it is automatized.

cat /usr/lib/systemd/system-sleep/nvidia_suspend.sh

#!/bin/bash # if [ "${1}" = "pre" ]; then # Do the thing you want before suspend here echo ON | sudo tee /proc/acpi/bbswitch elif [ "${1}" = "post" ]; then # Do the thing you want after resume here echo OFF | sudo tee /proc/acpi/bbswitch fi

rocky7x avatar Aug 18 '20 06:08 rocky7x

There's probably not much I can do from the optimus-manager side, but one thing you could try is powering up the card just before suspend (I suspect it doesn't like going to sleep while being turned off by bbswitch). To do that, run echo ON | sudo tee /proc/acpi/bbswitch before initiating suspend.

I somehow missed your suggestion to try powering up the card before suspend, but after @rocky7x's comment I can also confirm that this works. I've set up the systemd-sleep script as well, so now I don't have to think about this anymore and I can use the current kernel again without problems.

Thanks to all for the help!

Dzjed avatar Aug 18 '20 07:08 Dzjed

I seem to have the same issue (at least according to the logs), but with a different trigger: for me, it happens if I start my system off battery, in Intel mode, then try to switch to Nvidia mode. There is no such issue when starting with the power cable plugged in.

It's weird, to say the least.

I'll be trying the scripts mentioned at https://wiki.archlinux.org/index.php/Talk:Dell_XPS_15_9570 (I have a similar type of Dell laptop) today, maybe it'll help a bit. I'll report back if it fixes anything.

StanczakDominik avatar Sep 02 '20 05:09 StanczakDominik

I integrated the systemd-sleep script solution to optimus-manager, let me know if it fixes your issue (It also fixed a bunch of issues related to the Nvidia GPU waking up by itself after suspend, which is nice !).

Askannz avatar Oct 10 '20 08:10 Askannz

Thank you, @Askannz! Now it works again without the systemd-sleep script.

Dzjed avatar Oct 15 '20 12:10 Dzjed

I integrated the systemd-sleep script solution to optimus-manager, let me know if it fixes your issue (It also fixed a bunch of issues related to the Nvidia GPU waking up by itself after suspend, which is nice !).

I am experiencing the same issue right now. It was working fine after the first install but it doesn't work anymore. Any suggestions?

beyond9thousand avatar Dec 27 '21 09:12 beyond9thousand

Sup @beyond9thousand and @Dzjed

Is this still reproducible on a modern distribution with updated linux kernel, nvidia driver version and using the optimus-manager-git package? The project is facing a revamp so, please stick with the -git version of this package.

Let me know if this issue is still relevant to you ok?

This could have been solved recently by the driver because as the output states, nvidia module is failing specifically with NVreg_UsePageAttributeTable=1 option being passed during the module load.

Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/optimus_manager/kernel.py", line 179, in _load_module
    exec_bash("modprobe %s %s" % (module, options))
  File "/usr/lib/python3.8/site-packages/optimus_manager/bash.py", line 18, in exec_bash
    raise BashError(
optimus_manager.bash.BashError: Failed to execute 'modprobe nvidia NVreg_UsePageAttributeTable=1' :
modprobe: ERROR: could not insert 'nvidia': No such device

Have a nice week.

nwildner avatar Jul 03 '24 10:07 nwildner