envycontrol icon indicating copy to clipboard operation
envycontrol copied to clipboard

[BUG] Nvidia dGPU not powering down in hybrid mode

Open DreamingCuttlefish opened this issue 1 year ago • 8 comments

Describe the bug When I set envycontrol to hybrid mode, the nvidia gpu does not power down and continues to suck around 11W of power in the background. This does not occur in integrated mode.

To Reproduce Steps to reproduce the behavior:

  1. Run sudo envycontrol -s hybrid --dm sddm --rtd3 3
  2. reboot
  3. check powertop/nvtop/nvidia-smi
  4. see high power usage of the nvidia gpu, listed as being used by Xorg in nvidia-smi

Expected behavior The nvidia gpu should be powered off unless it is actively being used by an application

Screenshots Screenshot_20230330_020410 image image image

System Information:

  • Model: Razer Blade 14 2022 RZ09-0427x
  • Distro: Arch Linux
  • Kernel: 6.2.8-arch1-1
  • DE/WM and Display Manager (if applicable): KDE Plasma & SDDM (I made sure to set envycontrol to use sddm)
  • EnvyControl version: 3.2.0-2
  • Nvidia driver version: 530.41.03-1
  • lspci output:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Root Complex (rev 01)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Family 17h-19h IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge (rev 01)
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 14b8
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge (rev 01)
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe GPP Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge (rev 01)
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge (rev 01)
00:04.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h PCIe Dummy Host Bridge (rev 01)
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h Internal PCIe GPP Bridge (rev 10)
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h-19h Internal PCIe GPP Bridge (rev 10)
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Rembrandt Data Fabric: Device 18h; Function 7
01:00.0 VGA compatible controller: NVIDIA Corporation GA104 [Geforce RTX 3070 Ti Laptop GPU] (rev a1)
02:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)
03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
64:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] (rev c7)
64:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller
64:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP
64:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #3
64:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #4
64:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor (rev 60)
64:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller
65:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #8
65:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #5
65:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4 XHCI controller #6
65:00.5 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4/Thunderbolt NHI controller #1
65:00.6 USB controller: Advanced Micro Devices, Inc. [AMD] Rembrandt USB4/Thunderbolt NHI controller #2

Additional context Add any other context about the problem here. If possible try to reproduce the problem with --verbose flag and attach its output.

n/a

DreamingCuttlefish avatar Mar 30 '23 02:03 DreamingCuttlefish

Also; the laptop will not suspend in hybrid mode either.

DreamingCuttlefish avatar Mar 30 '23 02:03 DreamingCuttlefish

Try with a different value for the rtd3 flag

bayasdev avatar Mar 30 '23 02:03 bayasdev

Try with a different value for the rtd3 flag

I've tried setting rtd3 to 1, 2, and 3 (no reason to try 0) and all of them return the same result when I run nvidia-smi in terms of gpu power usage

DreamingCuttlefish avatar Mar 30 '23 02:03 DreamingCuttlefish

Try with a different value for the rtd3 flag

I've tried setting rtd3 to 1, 2, and 3 (no reason to try 0) and all of them return the same result when I run nvidia-smi in terms of gpu power usage

AFAIK nvidia-smi wakes up the GPU (even on Windows)

Try

sudo cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status

bayasdev avatar Mar 30 '23 03:03 bayasdev

Here's the output running that command right after rebooting my laptop: image

DreamingCuttlefish avatar Mar 30 '23 03:03 DreamingCuttlefish

I'm having the same issue.

Operating System: EndeavourOS 
KDE Plasma Version: 5.27.3
KDE Frameworks Version: 5.104.0
Qt Version: 5.15.8
Kernel Version: 6.2.8-zen1-1-zen (64-bit)
Graphics Platform: X11
Processors: 16 × Intel® Core™ i7-10875H CPU @ 2.30GHz
Memory: 15.4 GiB of RAM
Graphics Processor: Mesa Intel® UHD Graphics, NVIDIA GeForce RTX 2060 with Max-Q Design/PCIe/SSE2
Manufacturer: Dell Inc.
Product Name: XPS 17 9700

My laptop also refuses to sleep or hibernate because of this issue.

[ 211.472546] nvidia 0000:01:08.8: PM: pci_pm_suspend(): nv_pmops_suspend+8x8/0xf8 [nvidia] returns -5
[ 211.472901] nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1c8 returns -5
[ 211.472986] nvidia 0000:01:00.0: PM: failed to suspend async: error -5
[ 213.086448] PM: Some devices failed to suspend, or early wake event detected
[ 214.418818] nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0xf8 [nvidia] returns -5
[ 214.411152] nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+8x8/0x1c0 returns -5
[ 214.411158] nvidia 0000:01:00.0: PM: failed to suspend async: error -5
[ 214.689623] PM: Some devices failed to suspend, or early wake event detected
[ 473.114476] nvidia 8800:01:08.8: PM: pci_pm_suspend(): nv_pmops_suspend+8x0/0xf0 [nvidial returns. -5
[ 473.114817] nvidia 8800:01:08.8: PM: dpm_run_callback(): pci_pm_suspend+0x8/0x1c8 returns -5
[ 473.114822] nvidia 0888:01:00.0: PM: failed to suspend async: error -5
[
474.7111451 PM: Some devices failed to suspend, or early wake event detected
[ 476.238492] nvidia 8800:01:08.8: PM: pci_pm_suspend(): nv_pmops_suspend+8x8/8xf8 [nvidia] returns -5
[
476.232369] nvidia 0888:01:08.8: PM: dpm_run_callback(): pci_pm_suspend+0x8/8x1c8 returns -5
[
476.232376] nvidia 8888:81:88.8: PM: failed to suspend async: error -5
[ 476.518286] PM: Some devices failed to suspend, or early wake event detected
[488.571695] nvidia 0800:01:00.0: PM: pci_pm_suspend(): nv pmops_suspend+8x8/0xf0 [nvidia] returns -5
[488.572835] nvidia 8800:01:08.8: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x1c0 returns -5
488.572848] nvidia 8888:81:88.8: PM: failed to suspend async: error -5
488.782814] PM: Some devices failed to suspend, or early wake event detected
[
[ 498.878667] nvidia 8888:81:88.8: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0xf0 [nvidia] returns -5
498.8718811 nvidia 8888:81:88.8: PM: dpm_run_callback(): pci_pm_suspend+8x0/0x1c0 returns -5
498.871887] nvidia 8888:81:88.8: PM: failed to suspend async: error -5
[
[
[ 498.286859] PM: Some devices failed to suspend, or early wake event detected
[
497.8853151 ACPI Error: Thread 3384661248 cannot release Mutex [ECMX] acquired by thread 1705034240 (28221828/exmutex-378)
497.8853911 ACPI Error: Aborting method _SB.PC18.LPCB.ECDV._066 due to previous error (AE_AML_NOT_OHNER) (28221820/psparse-529) 498.751962] Bluetooth: hci8: Opcode 8x c24 failed: -110
[
[ 498.751972] Bluetooth: hci0: command 8x8488 tx timeout
[
498.767386] Bluetooth: hci8: Opcode 8x 488 failed: -187
[ 498.784417] Bluetooth: hci8: Suspend notifier action (3) failed: -107
[ 498.785315) Bluetooth: hci8: unexpected event for opcode 8x2042

jSQrD-dev avatar Mar 31 '23 02:03 jSQrD-dev

@J-SQRD-Dev

sudo systemctl enable nvidia-{suspend,resume,hibernate}

bayasdev avatar Mar 31 '23 02:03 bayasdev

@bayasdev Thank you.

jSQrD-dev avatar Mar 31 '23 02:03 jSQrD-dev