optimus-manager icon indicating copy to clipboard operation
optimus-manager copied to clipboard

Unable to switch to NVIDIA on kernel 6.0

Open Nitrooo opened this issue 1 year ago • 9 comments

I'm not sure if it can be fixed in Optimus Manager, but I'm going to post it anyway. The problem is described here: https://forum.manjaro.org/t/video-nvidia-470xx-fails-on-kernel-6-0/123867

I found a similar issue here: https://github.com/Bumblebee-Project/Bumblebee/issues/628

I'm on Manjaro Cinnamon. The laptop is a Lenovo T440p. Optimus Manager v1.4. No error in the logs as far as I remember.

Nitrooo avatar Oct 12 '22 19:10 Nitrooo

i can confirm : the problem is "Command 'modprobe nvidia NVreg_UsePageAttributeTable=1 NVreg_DynamicPowerManagement=0x02' died with <Signals.SIGSEGV: 11>." switching to older kernel works well

here is the full log ( just in case it is needed )


[6] INFO: # Xorg pre-start hook
[6] INFO: Previous state was: {'type': 'pending_pre_xorg_start', 'requested_mode': 'hybrid', 'current_mode': None}
[6] INFO: Requested mode is: hybrid
[22] INFO: Available modules: ['nouveau', 'bbswitch', 'nvidia', 'nvidia_drm', 'nvidia_modeset', 'nvidia_uvm']
[22] INFO: Unloading modules ['nouveau'] (if loaded)
[24] INFO: Loading module bbswitch
[50] INFO: Setting GPU power to ON via bbswitch
[62] INFO: Resetting Nvidia PCI device
[62] INFO: Unloading modules ['bbswitch'] (if loaded)
[93] INFO: Performing function-level reset of Nvidia
[103] INFO: Writing "1" to /sys/bus/pci/devices/0000:01:00.0/reset
[209] INFO: Writing "1" to /sys/bus/pci/devices/0000:01:00.1/reset
[209] ERROR: Nvidia PCI reset failed. Continuing anyways. Error is: Failed to perform PCI reset: Error writing to /sys/bus/pci/devices/0000:01:00.1/reset: [Errno 13] Permission denied: '/sys/bus/pci/devices/0000:01:00.1/reset'
[210] INFO: Setting Nvidia PCI power state to auto
[226] INFO: Writing "auto" to /sys/bus/pci/devices/0000:01:00.0/power/control
[227] INFO: Writing "auto" to /sys/bus/pci/devices/0000:01:00.1/power/control
[238] INFO: Loading module nvidia
[1295] ERROR: Xorg pre-start setup error
Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/optimus_manager/kernel.py", line 245, in _load_module
    subprocess.check_call(
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'modprobe nvidia NVreg_UsePageAttributeTable=1 NVreg_DynamicPowerManagement=0x02' died with <Signals.SIGSEGV: 11>.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/optimus_manager/hooks/pre_xorg_start.py", line 51, in main
    setup_kernel_state(config, prev_state, requested_mode)
  File "/usr/lib/python3.10/site-packages/optimus_manager/kernel.py", line 22, in setup_kernel_state
    _nvidia_up(config, hybrid=(requested_mode == "hybrid"))
  File "/usr/lib/python3.10/site-packages/optimus_manager/kernel.py", line 95, in _nvidia_up
    _load_nvidia_modules(config, available_modules)
  File "/usr/lib/python3.10/site-packages/optimus_manager/kernel.py", line 164, in _load_nvidia_modules
    _load_module(available_modules, "nvidia", options=nvidia_options)
  File "/usr/lib/python3.10/site-packages/optimus_manager/kernel.py", line 249, in _load_module
    raise KernelSetupError(f"Error running modprobe for {module}: {e.stderr}") from e
optimus_manager.kernel.KernelSetupError: Error running modprobe for nvidia: None
[1297] INFO: Removing /etc/X11/xorg.conf.d/10-optimus-manager.conf (if present)
[1297] INFO: Writing state {'type': 'pre_xorg_start_failed', 'switch_id': '20221016T173709', 'requested_mode': 'hybrid'}


hardware : MSI pulse GL66 I7-11800 3060 mobile

switching method BBswitch

ECO1AI avatar Oct 16 '22 14:10 ECO1AI

here too! nvidia stuck, hybrid kernel panic! integrated working normal.

yichangshengwu avatar Oct 27 '22 07:10 yichangshengwu

ops sorry for my delay

the problem might be with the kernel

it is because kernel 6 is black listing nvidia driver

try this :

when booting via grub edit the boot option via pressing e

then after linux /boot/vmlinuz-linux root=UUID=0a3407de-014b-458b-b5c1-848e92a327a3 rw initrd=/boot/initramfs-linux.img "there might be some minor deferences " add ibt=off the boot and tell me the results

ECO1AI avatar Oct 27 '22 07:10 ECO1AI

I added ibt=off to GRUB_CMDLINE_LINUX_DEFAULT but it didn't work. Same error.

Nitrooo avatar Oct 27 '22 11:10 Nitrooo

send me the results of sudo journalctl -p 3 -xb in addition to systemctl status optimus-manager.service also a photo of neofetch and optimus log file

ill see what i can do

ECO1AI avatar Oct 27 '22 16:10 ECO1AI

journalctl -p 3 -xb

ott 27 19:07:00 T440p kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0], AE_NOT_FOUND (20220331/dswload2-162)
ott 27 19:07:00 T440p kernel: ACPI Error: AE_NOT_FOUND, During name lookup/catalog (20220331/psobject-220)
ott 27 19:07:00 T440p kernel: ACPI Error: Needed type [Reference], found [Integer] 000000004ddc5f32 (20220331/exresop-66)
ott 27 19:07:00 T440p kernel: ACPI Error: AE_AML_OPERAND_TYPE, While resolving operands for [Store] (20220331/dswexec-431)
ott 27 19:07:00 T440p kernel: ACPI Error: Aborting method \_PR.CPU0._PDC due to previous error (AE_AML_OPERAND_TYPE) (20220331/psparse-529)
ott 27 19:07:01 T440p kernel: snd_hda_intel 0000:02:00.1: no codecs found!
ott 27 19:07:01 T440p systemd-udevd[353]: could not read from '/sys/module/pcc_cpufreq/initstate': No such device
ott 27 19:07:03 T440p kernel: 
ott 27 19:07:04 T440p kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000200] Failed to allocate NvKmsKapiDevice
ott 27 19:07:04 T440p kernel: [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000200] Failed to register device
ott 27 19:07:24 T440p systemd[1]: Failed to start Light Display Manager.
░░ Subject: A start job for unit lightdm.service has failed
░░ Defined-By: systemd
░░ Support: https://forum.manjaro.org/c/support
░░ 
░░ A start job for unit lightdm.service has finished with a failure.
░░ 
░░ The job identifier is 1851 and the job result is failed.

systemctl status optimus-manager.service

● optimus-manager.service - Optimus Manager Commands Daemon
     Loaded: loaded (/usr/lib/systemd/system/optimus-manager.service; enabled; preset: disabled)
     Active: active (running) since Thu 2022-10-27 19:07:04 CEST; 2min 31s ago
    Process: 728 ExecStartPre=/usr/bin/python3 -u -m optimus_manager.hooks.pre_daemon_start (code=exited, status=0/SUCCESS)
    Process: 847 ExecStartPre=/usr/bin/python3 -u -m optimus_manager.hooks.pre_xorg_start (code=exited, status=0/SUCCESS)
   Main PID: 1032 (python3)
      Tasks: 1 (limit: 19029)
     Memory: 44.4M
        CPU: 1.986s
     CGroup: /system.slice/optimus-manager.service
             └─1032 /usr/bin/python3 -u -m optimus_manager.daemon

ott 27 19:07:02 T440p python3[847]: [73] INFO: Loading module nvidia
ott 27 19:07:04 T440p python3[847]: [1677] INFO: Loading module nvidia_drm
ott 27 19:07:04 T440p python3[847]: [1847] INFO: Loaded extra nvidia-mode Xorg options (1 lines)
ott 27 19:07:04 T440p python3[847]: [1848] INFO: Writing to /etc/X11/xorg.conf.d/10-optimus-manager.conf
ott 27 19:07:04 T440p python3[847]: [1848] INFO: Writing state {'type': 'pending_post_xorg_start', 'switch_id': '20221027T190702', 'requested_mode': 'nvidia'}
ott 27 19:07:04 T440p python3[847]: [1849] INFO: Xorg pre-start hook completed successfully.
ott 27 19:07:04 T440p systemd[1]: Started Optimus Manager Commands Daemon.
ott 27 19:07:04 T440p python3[1032]: [1] INFO: # Commands daemon
ott 27 19:07:04 T440p python3[1032]: [1] INFO: Opening UNIX socket
ott 27 19:07:04 T440p python3[1032]: [1] INFO: Awaiting commands

neofetch

OS: Manjaro Linux x86_64 
Host: 20ANCTO1WW ThinkPad T440p 
Kernel: 6.0.2-2-MANJARO 
Uptime: 6 mins 
Packages: 2106 (pacman) 
Shell: fish 3.5.1 
Resolution: 1920x1080 
Terminal: /dev/tty2 
CPU: Intel i7-4710MQ (8) @ 3.500GHz 
GPU: NVIDIA GeForce GT 730M 
GPU: Intel 4th Gen Core Processor 
Memory: 690MiB / 15877MiB

Optimus Manager log:

[18] INFO: # Xorg pre-start hook
[18] INFO: Previous state was: {'type': 'pending_pre_xorg_start', 'requested_mode': 'nvidia', 'current_mode': None}
[18] INFO: Requested mode is: nvidia
[46] INFO: Available modules: ['nouveau', 'nvidia', 'nvidia_drm', 'nvidia_modeset', 'nvidia_uvm']
[46] INFO: Unloading modules ['nouveau'] (if loaded)
[49] INFO: switching=none, nothing to do
[73] INFO: Loading module nvidia
[1677] INFO: Loading module nvidia_drm
[1847] INFO: Loaded extra nvidia-mode Xorg options (1 lines)
[1848] INFO: Writing to /etc/X11/xorg.conf.d/10-optimus-manager.conf
[1848] INFO: Writing state {'type': 'pending_post_xorg_start', 'switch_id': '20221027T190702', 'requested_mode': 'nvidia'}
[1849] INFO: Xorg pre-start hook completed successfully.

Nitrooo avatar Oct 27 '22 18:10 Nitrooo

ok here is the deal the problem has nothing to do with optimus but rather with the kernel it self

the fact that INFO: Xorg pre-start hook completed successfully. showed up is a good sign that optimus functions correctly also, both processes that belong to optimus is active is a good sign

the problem is

ott 27 19:07:04 T440p kernel: [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000200] Failed to allocate NvKmsKapiDevice ott 27 19:07:04 T440p kernel: [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000200] Failed to register device

which after research indicates mostly a hardware problem ( either with bios or external monitor as example ) someone was able to solve the issue by resetting bios it seems like there is something in your system that kernel 6 hate

you have some ACPI problems that could cause the problem did you install all of the necessary divers for thinkpad ? try sending the journal command results on lower kernel ( try on LTS and again on 5.19 RT ) i want to know if the problem is with the new 520 nvidia driver or the kernel or the hardware try to disable secure boot temporarily and try

also @yichangshengwu send the debug info's that i mentioned earlier and hopefully i can help you

ECO1AI avatar Oct 27 '22 20:10 ECO1AI

@ECO1AI Thank you. However, we already knew what the error message is, as it was stated in the post I linked at the top of this thread. I can't use the 520 driver as the latest Nvidia driver that supports my GPU is 470. I looked for the ACPI errors, but apparently they're junk log entries that can be ignored (they're also shown on 5.19 anyway). The GPU works fine on kernel 5.19, so I'd exclude hardware issues.

I just found this: https://aur.archlinux.org/packages/nvidia-470xx-dkms It looks like the current 470 driver isn't fully compatible with kernel 6 yet. I guess we'll have to wait for Nvidia to release a new version.

Nitrooo avatar Oct 27 '22 20:10 Nitrooo

I managed to make the driver work on kernel 6. Brief instructions here.

Nitrooo avatar Nov 03 '22 12:11 Nitrooo

Reopen if still there.

es20490446e avatar Mar 27 '24 13:03 es20490446e