suspend/resume with vga switch question
Hi,
I use a Thinkpad P50 with FreeBSD main and the nvidia-driver. I have it set to 'discrete graphics' in the firmware and everything works great except for suspend/resume. The system does suspend and then resume and is available from the network, but the screen never comes back to life. It seems like the screen is physically unavailable after resume.
I am wondering if this is due to a "vga switch" (Linux term?) and if the source level parts of the nvidia-driver will let me investigate deeper and even fix it although I have no specific experience with ACPI so this has been a bit of a learning exercise. Would https://github.com/amshafer/nvidia-driver/blob/470_clean/src/nvidia/nvidia_acpi.c#L360 or the _DSM method need some extension to handle the VGA switch? Any feedback is appreciated!
On sddm start the _DSM object looks like it might have grabbed the wrong handle:
ACPI Warning: \134_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210730/nsarguments-212)
Here is the output after acpiconf -s3 and resume:
rtsx0: Suspend
uhub0: at usbus0, port 1, addr 1 (disconnected)
ugen0.2: <Chicony Electronics Co.,Ltd. Integrated Camera> at usbus0 (disconnected)
ugen0.3: <vendor 0x138a product 0x0090> at usbus0 (disconnected)
ugen0.4: <Generic EMV Smartcard Reader> at usbus0 (disconnected)
ugen0.5: <vendor 0x0765 product 0x5010> at usbus0 (disconnected)
uhid0: at uhub0, port 13, addr 4 (disconnected)
uhid0: detached
ugen0.6: <vendor 0x8087 product 0x0a2b> at usbus0 (disconnected)
ubt0: at uhub0, port 14, addr 5 (disconnected)
ubt0: detached
uhub0: detached
nvme0: waiting
pcib0: failed to set ACPI power state D2 on \134_SB_.PCI0: AE_BAD_PARAMETER
acpi0: cleared fixed power button status
rtsx0: Resume
uhub0 on usbus0
uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
em0: link state changed to DOWN
em0: link state changed to UP
uhub0: 26 ports with 26 removable, self powered
NVRM: GPU at PCI:0000:01:00: GPU-bdfd89e9-ff1f-b631-7f8e-a7e3241c6671
NVRM: Xid (PCI:0000:01:00): 79, pid=1542, GPU has fallen off the bus.
NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
nvidia-modeset: ERROR: GPU:0: Failed detecting connected display devices
nvidia-modeset: WARNING: GPU:0: Failure processing EDID for display device Sharp (DP-4).
nvidia-modeset: WARNING: GPU:0: Unable to read EDID for display device Sharp (DP-4)
nvidia-modeset: ERROR: GPU:0: Failure reading maximum pixel clock value for display device DP-4.
ugen0.2: <Chicony Electronics Co.,Ltd. Integrated Camera> at usbus0
ugen0.3: <vendor 0x138a product 0x0090> at usbus0
ugen0.4: <Generic EMV Smartcard Reader> at usbus0
ugen0.5: <vendor 0x0765 product 0x5010> at usbus0
uhid0 on uhub0
uhid0: <vendor 0x0765 product 0x5010, class 0/0, rev 2.00/0.00, addr 4> on usbus0
ugen0.6: <vendor 0x8087 product 0x0a2b> at usbus0
ubt0 on uhub0
ubt0: <vendor 0x8087 product 0x0a2b, class 224/1, rev 2.00/0.01, addr 5> on usbus0
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:1:0:0x0000000f
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:2:0:0x0000000f
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:3:0:0x0000000f
nvidia-modeset: ERROR: GPU:0: DP-4: Failed to disable DisplayPort audio stream-0
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000947d:0:0:0x0000000f
Hi Kevin,
I think the issue here (and I'm no acpi expert either so feel free to correct me) is the following lines:
pcib0: failed to set ACPI power state D2 on \134_SB_.PCI0: AE_BAD_PARAMETER
...
NVRM: Xid (PCI:0000:01:00): 79, pid=1542, GPU has fallen off the bus.
NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
To me that looks like the laptop can't power up the PCI bus by setting it to the D2 state, so the nvidia driver fails to find the card. Then the driver throws an exception (Xid) saying it can't function because there no longer is a connected gpu. The screen never turns on because the card doesn't have adequate power and isn't being driven properly.
I would bet that this is caused by the kernel not playing nice with ACPI for the laptop. iirc there's a way to get lots of ACPI debug logging that might tell you what the kernel and hw are disagreeing on. Are you seeing other peripherals on the PCI bus having issues too? Is that ACPI warning line when starting sddm after the first boot or starting sddm after resume?
All that said, suspend/resume with nvidia on bsd should just work, since you're running current you will already have this patch. You're not using this repo to build the nvidia driver are you? I'm working on polishing this so it has some known issues if you're using nvidia-drm.ko.
I checked and I do have that patch (it landed in freebsd.org repos). I am running ports x11/nvidia-driver with no changes except for OPTIONS ACPI_PM enabled.
So here's a weird observation -- when X11 is running, the screen comes alive early for vt and then dies when it seems to try to switch back into to X11. If suspend/resume from vt without X11 running, this does not take place the screen just remains off.
I do not see any other issues with suspend/resume on the laptop. Switching to i915 results in correct function.
Ah right, this is an optimus laptop isn't it. You probably want to drive the display from i915 for power efficiency reasons, and then use something like this or like this to run certain things on the nvidia gpu.
(Optimus is actually something this repo should eventually help with in the way off distant future)
I'm pretty sure it's just a mux issue on this laptop, it should be able to be a pure nvidia system with that in the right state like it is on boot from the firmware.
I played around a bit with PRIME under X11 and Windows. Even under Windows 11 (prime? optimus?) is kind of a disappointment on a 4k screen, I'd like to use just the dGPU for everything which does work well on boot, Windows suspend/resumes fine with just dGPU so I am pretty sure it's just the MUX needing to be set right on resume.
I do see this with i915kms:
Firmware Warning (ACPI): Possibly buggy BIOS with ACPI_TYPE_INTEGER for function enumeration
(20210930/ACPI-2805)
I have convinced myself you were on the right track after some experimentation, I wonder if the issues is actually https://cgit.freebsd.org/src/tree/sys/dev/pci/vga_pci.c going to try some experiments, maybe using FLR
Any news? I'm curious …