CoreFreq icon indicating copy to clipboard operation
CoreFreq copied to clipboard

Steam Deck

Open wallentx opened this issue 2 years ago • 34 comments

Hello @cyring ! Following up from our chat on reddit.

Now that they've fixed the kernel headers package, I can build again.


          '###############             
             ,#############            Name:                AMD Custom APU 0405
                      .####            Microarchitecture:   Zen 2
              #.      .####            Technology:          7nm
            :##.      .####            Max Frequency:       2.799 GHz
           :###.      .####            Cores:               4 cores (8 threads)
           #########.   :##            AVX:                 AVX,AVX2
           #######.       ;            FMA:                 FMA3
                                       L1i Size:            32KB (128KB Total)
    ###     ###      ###   #######     L1d Size:            32KB (128KB Total)
   ## ##    #####  #####   ##     ##   L2 Size:             512KB (2MB Total)
  ##   ##   ### #### ###   ##      ##  L3 Size:             4MB
 #########  ###  ##  ###   ##      ##  Peak Performance:    358.27 GFLOP/s
##       ## ###      ###   ##     ##   
##       ## ###      ###   #######     

As a first step, I added these to my kernel boot command parameters: modprobe.blacklist=acpi_cpufreq idle=halt tsc=unstable then I updated grub, and rebooted.

The deck booted up to the "gamemode" UI (which is an Xwayland process) just fine, but when switching to "Desktop Mode" (which is X11), this actually resulted in sddm failing to start the desktop, leaving me with just a black screen. I was able to switch to TTY4 and look at journal logs and sddm and x11 logs.. but there was quite a bit to sort through.. The deck does quite a bit in desktop mode.. I'm not sure how familiar you may be with it, but it's an immutable Arch Linux setup, with an A/B partition scheme:

lsblk -f
NAME        FSTYPE FSVER LABEL  UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
mmcblk0                                                                             
└─mmcblk0p1 ext4   1.0          9571c35d-19ad-4c6f-a4e0-cefb4578854d    937G     0% /run/media/deck/9571c35d-19ad-4c6f-a4e0-cefb4578854d
zram0                                                                               [SWAP]
nvme0n1                                                                             
├─nvme0n1p1 vfat   FAT16 esp    5478-3964                                           
├─nvme0n1p2 vfat   FAT16 efi    5478-EA6B                                           
├─nvme0n1p3 vfat   FAT16 efi    5479-935D                                           
├─nvme0n1p4 btrfs        rootfs 5719b9c0-eb87-44fc-9be8-83a9ae064bd7  863.9M    81% /
├─nvme0n1p5 btrfs        rootfs cf367b21-6580-4b8e-82cd-0ffd8f2b4239                
├─nvme0n1p6 ext4   1.0   var    2127a069-82fc-4385-ab82-a6f895dc5806  168.8M    19% /var
├─nvme0n1p7 ext4   1.0   var    0837e2ba-9eeb-4feb-a17d-6ae0a3aaec7a                
└─nvme0n1p8 ext4   1.0   home   ca1d4bea-2044-4a47-96c4-3559fea4007e  304.7G    67% /var/tmp
                                                                                    /var/log
                                                                                    /var/lib/systemd/coredump
                                                                                    /var/lib/steamos-log-submitter
                                                                                    /var/lib/flatpak
                                                                                    /var/lib/docker
                                                                                    /var/cache/pacman
                                                                                    /srv
                                                                                    /root
                                                                                    /opt
                                                                                    /nix
                                                                                    /home

It does seem to do some health checking at startup to determine if it shut down cleanly, and I think it might be able to automatically swap over to the alternate partitions if your current one is busted.. but I've had to manually switch it a few times when I got things in a bad state.

There's also a spiderweb of systemd services that I'd need a flowchart to follow.

All of that aside, I think debugging what failed starting desktop mode is my chore to deal with, but the important part you might be interested in is the corefreq functionality with it.

I enabled sshd on the deck, and sshed into it while it was booted at the default Xwayland "game mode" UI, so as to not trigger any odd failures. I checked dmesg and journalctl errors for anything that looked out of place, and didn't see much. I built it from master (develop doesn't seem to be working at the moment) with:

make DELAY_TSC=1 clean all
sudo make install
sudo insmod corefreqk.ko Register_ClockSource=1 Register_CPU_Freq=1 Register_Governor=1 Register_CPU_Idle=1 Experimental=1
echo "corefreq_tsc" | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
sudo systemctl start corefreqd

and immediately saw a bunch of these messages in systemd-journal

------------[ cut here ]------------
Unable to find AMD Northbridge id for 0000:00:18.0
WARNING: CPU: 2 PID: 0 at arch/x86/include/asm/amd_nb.h:98 CCD_AMD_Family_17h_Zen2_Temp+0x43e/0x730 [corefreqk]
Modules linked in: corefreqk(OE) tls uinput rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ccm algif_aead cbc des_generic libdes ecb md4 nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter cmac algif_hash algif_skcipher af_alg bnep ramoops reed_solomon mousedev amdgpu snd_acp5x_pcm_dma snd_acp5x_i2s intel_rapl_msr snd_soc_acp5x_mach intel_rapl_common snd_sof_amd_vangogh snd_sof_amd_acp edac_mce_amd snd_sof_pci snd_soc_cs35l41_spi snd_sof_xtensa_dsp btusb rtw88_8822ce btrtl kvm_amd snd_sof btbcm snd_soc_cs35l41 hid_multitouch snd_sof_utils snd_hda_codec_hdmi btintel snd_soc_wm_adsp rtw88_8822c kvm amdgpu_xcp_drv drm_buddy gpu_sched cs_dsp rtw88_pci drm_ttm_helper
 irqbypass snd_hda_intel btmtk rtw88_core ttm snd_soc_nau8821 snd_soc_cs35l41_lib snd_intel_dspcfg crct10dif_pclmul snd_intel_sdw_acpi drm_display_helper crc32_pclmul cdc_acm joydev bluetooth hid_steam atkbd snd_soc_core snd_hda_codec mac80211 polyval_clmulni snd_pci_acp5x cec snd_hda_core ecdh_generic polyval_generic snd_acp_config snd_hwdep gf128mul ccp snd_soc_acpi libps2 libarc4 ghash_clmulni_intel sha512_ssse3 snd_compress cdc_mbim aesni_intel ac97_bus snd_pcm_dmaengine cdc_wdm crypto_simd snd_pcm cryptd rapl vivaldi_fmap wdat_wdt pcspkr cfg80211 snd_timer ltrf216a opt3001 video sp5100_tco mmc_block i2c_hid_acpi i2c_piix4 snd rfkill wmi 8250_dw soundcore industrialio i2c_hid cdc_ncm mac_hid cdc_ether usbnet mii pkcs8_key_parser crypto_user fuse dm_mod loop zram bpf_preload ip_tables x_tables overlay ext4 crc16 mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq nvme sdhci_pci extcon_steamdeck serio_raw leds_steamdeck steamdeck_hwmon
 cqhci nvme_core sdhci crc32c_intel i8042 xhci_pci mmc_core xhci_pci_renesas steamdeck serio nvme_common spi_amd
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W  OE      6.1.52-valve12-1-neptune-61 #1 17e6c6d2eaae83165cc4ac3400ca3b252198e6eb
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0120 12/01/2023
RIP: 0010:CCD_AMD_Family_17h_Zen2_Temp+0x43e/0x730 [corefreqk]
Code: 01 0f 95 c0 21 c6 e9 fb fc ff ff 49 8b b7 20 01 00 00 48 85 f6 75 07 49 8b b7 d0 00 00 00 48 c7 c7 d0 65 12 c2 e8 32 2e fa d9 <0f> 0b 31 db e8 99 12 f8 d9 66 39 c3 0f 83 31 fe ff ff 48 8b 05 61
RSP: 0018:ffffbd78c01ece38 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
RDX: ffff9a6b6eca0728 RSI: 0000000000000001 RDI: ffff9a6b6eca0720
RBP: 0000000000000050 R08: 0000000000000000 R09: ffffbd78c01ecca8
R10: 0000000000000003 R11: ffff9a6b7ef7ffe8 R12: 0000000000000001
R13: ffff9a68d8070000 R14: 0000000000000000 R15: ffff9a684158d000
FS:  0000000000000000(0000) GS:ffff9a6b6ec80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000326608116000 CR3: 0000000105d9e000 CR4: 0000000000350ee0
Call Trace:
 <IRQ>
 ? CCD_AMD_Family_17h_Zen2_Temp+0x43e/0x730 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 ? __warn+0x7d/0xd0
 ? CCD_AMD_Family_17h_Zen2_Temp+0x43e/0x730 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 ? report_bug+0xe6/0x150
 ? handle_bug+0x3a/0x70
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? CCD_AMD_Family_17h_Zen2_Temp+0x43e/0x730 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 ? CCD_AMD_Family_17h_Zen2_Temp+0x43e/0x730 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 ? Call_SVI_APU+0x7b0/0x7b0 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 Cycle_AMD_Family_17h+0x368/0x5d0 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 Cycle_AMD_F17h_Zen2_SP+0xb0/0xe0 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 ? Cycle_AMD_F17h_Zen+0xe0/0xe0 [corefreqk b5775744e63272f438e8a834527df78680ee2742]
 __hrtimer_run_queues+0x10f/0x2b0
 hrtimer_interrupt+0xf8/0x210
 __sysvec_apic_timer_interrupt+0x5e/0x110
 sysvec_apic_timer_interrupt+0x6d/0x90
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:cpuidle_enter_state+0xdc/0x410
Code: 49 89 c5 0f 1f 44 00 00 31 ff e8 0f f0 7d ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 20 03 00 00 31 ff e8 98 a9 84 ff fb 45 85 f6 <0f> 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d 0c c4 48
RSP: 0018:ffffbd78c0113e90 EFLAGS: 00000202
RAX: ffff9a6b6ecb1b40 RBX: ffffdd78bfaadfb0 RCX: 00000000000000f0
RDX: 0000000000000002 RSI: 000000adf769bc71 RDI: 0000000000000000
RBP: 0000000000000002 R08: ffffffffffcd237b R09: 000000002db7ce55
R10: ffff9a6b6ecb0644 R11: 0000000000000688 R12: ffffffffc21380a8
R13: 000000adf769bc71 R14: 0000000000000002 R15: 0000000000000000
 cpuidle_enter+0x2d/0x40
 do_idle+0x1df/0x260
 cpu_startup_entry+0x1d/0x20
 start_secondary+0x123/0x140
 secondary_startup_64_no_verify+0xe5/0xeb
 </TASK>
---[ end trace 0000000000000000 ]---

There were variations in which CPU it was referring to between messages. Testing the CLI, everything seemed to actually be working for the most part. I got a few "invalid clock source" errors when trying to change some of the max frequency values. Also, CPCC is OFF by default, and I was able to toggle it on, but once it was enabled, there was no way for me to turn it off.

Here's all the info I could dump from corefreq-cli: https://gist.github.com/wallentx/9788fb6f304dbe598440c401a0ce80ed

Please let me know if there's any other info that I can grab for, or anything I can test that might be useful.

wallentx avatar Dec 31 '23 01:12 wallentx

Hello,

Thank you for your report. I'm glade you are going through with the kernel requirements on this platform, I'm indeed not familiar with handled PC.


From now, I'm suggesting you to switch to the branch develop. In surface, the major difference is that Makefile now stores the CoreFreq binaries into the build sub-directory.

make -j for parallel building is working, but not make -j clean all. You will have to make clean; make -j to fully rebuild.


Your CoreFreq output is really helpful. All the issues I can see in it.

  • Temperature TjMax is missing

  • Power values TDP PP# PPT and Amps are missing All because the Firmware/SMU protocol is not implemented

  • _CPC from ACPI table has not been retrieved through Kernel means

  • CPPC ratios are not available but CPPC EPP is capable ! develop branch now shows CPPC in a dedicated window Like Intel, Hardware CPPC can only be switch on. Those are write once Registers; whereas Firmware CPPC can be toggled on/off as will if BIOS allows to.

  • Memory Controller was not probe as shown by: AuthenticAMD [ 0] Can you add to your Gist the PCI list from lspci -nn

  • About the kernel dump, another thermal query function has to change. (this is the first "Van Gogh" report I'm receiving)

Still in the develop branch, we will try another function as below:

  1. edit x86_64/corefreqk.h file
  2. go to section [AMD_Zen2_Jupiter] (line 11494)
  3. replace .Query = Query_AMD_F17h_PerCluster, with .Query = Query_AMD_F17h_PerSocket,
  4. save file and exit
  5. rebuild and provide the sensors corefreq-cli -C 1
  6. also have a look if kernel still dumps same error.

I must fix spelling to [Zen2/Van Gogh]

cyring avatar Dec 31 '23 02:12 cyring

In addition to the above, I'll appreciate to see some photos of the Steam Deck with CoreFreq running. At least, two for the UI:

  • In pure tty console
  • In a Term of the window manager

These should help me to verify the ANSI colors and other ASCII layout support in Steam OS.

cyring avatar Dec 31 '23 09:12 cyring

I'll do all of those and post an updated gist this evening.

Also, something that might be of interest: https://www.tomshardware.com/pc-components/cpus/steam-decks-custom-amd-processor-exposed

High Yield further speculates that original Steam Decks may be able to utilize the CVPE hardware that currently goes unused. However, that depends on whether AMD manually disables the CVPE using a laser or if it's merely turned off by firmware. It's also not clear how well modders would be able to utilize the CVPE since it's used exclusively in Magic Leap hardware and software

I'm curious if corefreq is able to see any of this unused hardware.

wallentx avatar Dec 31 '23 20:12 wallentx

I'm curious if corefreq is able to see any of this unused hardware.

CoreFreq is a Registers based software. Any feature bit which is well documented can take place into CoreFreq

AMD specifies processor'registers within a PPR datasheet. See the project Wiki for Manufacturers documentation. Those are highly technical but that's the bare minimum I need to program a Processor.

In short I need the Registers datasheet of the technologies mentioned by article.


Does the Steam Deck have the 7nm Zen2 ? Because I'm wondering if the latest Deck, based on a 6nm, won't have the same architecture detail ? But will be identified with same CPUID Family/Model but a different Stepping ?

cyring avatar Dec 31 '23 21:12 cyring

Does the Steam Deck have the 7nm Zen2 ?

Cpufetch reports my device as 7nm

Because I'm wondering if the latest Deck, based on a 6nm, won't have the same architecture detail ? But will be identified with same CPUID Family/Model but a different Stepping ?

That's a good question. I think people are only just recently receiving their OLED Steam Decks.. Geekbench reports the CPU Name incorrectly, however, if you compare this result from what I assume is the OLED Deck with the "Sephoroth" APU: https://browser.geekbench.com/v6/cpu/4206516

AuthenticAMD Family 23 Model 145 Stepping 0

with my result: https://browser.geekbench.com/v6/cpu/4204172

AuthenticAMD Family 23 Model 144 Stepping 2

it would seem so? Or maybe "Model" indicates otherwise. I don't know anyone with an OLED Deck, but if I find out of someone, I'll see if they can dump some info.

wallentx avatar Dec 31 '23 21:12 wallentx

it would seem so? Or maybe "Model" indicates otherwise. I don't know anyone with an OLED Deck, but if I find out of someone, I'll see if they can dump some info.

I appreciate a lot. Thank you.


Brand nm CPUID Architecture Codename SKU Model
Steam Deck LCD 7 8F_90 Zen2 Van Gogh Aerith Valve Jupiter
Steam Deck OLED 6 8F_91 Zen2 Van Gogh Sephiroth Valve Galileo

cyring avatar Dec 31 '23 22:12 cyring

It's a bit confusing, but as I understand it, the APUs in both the LCD, and OLED deck are based on the Van Gogh architecture. The SKU that Valve has given the 7nm chip is "Aerith" (line 8 from corefreq-cli -B), and the SKU for the 6nm is "Sephiroth". Both are still Van Gogh, though.

https://www.phoronix.com/benchmark/result/steam-deck-oled-vs-steam-deck-lcd-benchmarks/result.svgz

wallentx avatar Dec 31 '23 22:12 wallentx

It's a bit confusing, but as I understand it, the APUs in both the LCD, and OLED deck are based on the Van Gogh architecture. The SKU that Valve has given the 7nm chip is "Aerith" (line 8 from corefreq-cli -B), and the SKU for the 6nm is "Sephiroth". Both are still Van Gogh, though.

https://www.phoronix.com/benchmark/result/steam-deck-oled-vs-steam-deck-lcd-benchmarks/result.svgz

Thks, fixing above table ... and commit 2e8ee92aa0a281aa51ef12c8555e4d92b263eff2 released

cyring avatar Dec 31 '23 22:12 cyring

I think there is a different issue with develop as well:

(5)(william@steamdeck build)$ ./corefreqd
corefreqd execution error code 13
Permission denied @ line 9386
(6)(william@steamdeck build)$ sudo ./corefreqd
Driver connection error code 13
Version 0.0.0: 'Permission denied' @ line 9353

This is what I was running into yesterday when building from develop.

wallentx avatar Dec 31 '23 23:12 wallentx

Permission denied @ line 9386

You were not root but you found that afterward

Permission denied' @ line 9353

API mismatch between Driver and Daemon

You have to fully unload from memory any previous version of the kernel module. CoreFreq prevents this to happen.

as root

  1. rmmod corefreqk
  2. lsmod | grep corefreqk ## output has to be empty
  3. Uninstall the master version if it was.
  4. Build the develop version
  5. insmod build/corefreqk.ko ## explicitly load module from its path
  6. ./build/corefreqd -d ## Daemon is started in trace mode

as user

  1. ./build/corefreq-cli

cyring avatar Dec 31 '23 23:12 cyring

Updated info: https://gist.github.com/wallentx/1bde7993752c2da0f6baf5da625b5309

wallentx avatar Jan 01 '24 00:01 wallentx

Updated info: https://gist.github.com/wallentx/1bde7993752c2da0f6baf5da625b5309

Thank you.

Was this test made with .Query = Query_AMD_F17h_PerSocket, as requested above or you just ran the branch without touching code ?

cyring avatar Jan 01 '24 01:01 cyring

Was this test made with .Query = Query_AMD_F17h_PerSocket, as requested above or you just ran the branch without touching code ?

2023-12-31_19-51

wallentx avatar Jan 01 '24 01:01 wallentx

So unfortunately in both Query functions we are getting a kernel dump.

What is common in both issues is the SMU call made to retrieve thermal. See at: https://github.com/cyring/CoreFreq/blob/c69fe378217982ab2b2abdff79edf23c0bd7c50b/x86_64/corefreqk.c#L15692

To check if we can talk with the SMU, you can compile and run zencli with common SMU addresses:

## Thermal
### SMU_AMD_THM_TCTL_REGISTER_F17H

./zencli smu 0x00059800

### SMU_AMD_THM_TCTL_CCD_REGISTER_F19H_61H

./zencli smu 0x00059B08

## Frequency 
### SMU_AMD_F17H_ZEN2_MCM_COF

./zencli smu 0x0005d324

zencli is a user-space software, there is no kernel module; just run it as root.

There is a bunch of registers for the AMD SMU in amd_reg If none is working, it means we don't know the SMU memory map of Van Gogh or, worst, there is no SMU in this APU.

cyring avatar Jan 01 '24 02:01 cyring

(william@steamdeck zencli)$ sudo ./zencli smu 0x00059800 && sudo ./zencli smu 0x00059B08 && sudo ./zencli smu 0x0005d324
[0x00059800] READ(smu) = 0x31000fef (822087663)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0011 0001 0000 0000 0000 1111 1110 1111
[0x00059b08] READ(smu) = 0x00000000 (0)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[0x0005d324] READ(smu) = 0x01000000 (16777216)
   60   56   52   48   44   40   36   32   28   24   20   16   12   08   04   00
 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000

wallentx avatar Jan 01 '24 02:01 wallentx

Also, I don't know if this is important or relevant, but I'm putting this info out here just in case it is:

/etc/default/grub:GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off nowatchdog nmi_watchdog=0 loglevel=3 quiet splash plymouth.ignore-serial-consoles module_blacklist=tpm log_buf_len=4M amd_iommu=off amdgpu.gttsize=8128 spi_amd.speed_dev=1 audit=0 fbcon=vc:4-6 fbcon=rotate:1"
/etc/default/grub:GRUB_CMDLINE_LINUX="console=tty1 rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 rd.systemd.gpt_auto=no modprobe.blacklist=acpi_cpufreq idle=halt tsc=unstable"
/etc/default/grub-legacy:GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet fbcon=rotate:1 iommu=off"
/etc/default/grub-legacy:GRUB_CMDLINE_LINUX="fbcon=rotate:1 iommu=off"
/etc/default/grub-steamos:GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} amd_iommu=off amdgpu.gttsize=8128 spi_amd.speed_dev=1 audit=0 fbcon=vc:2-6"

sudo cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-linux-neptune-61 console=tty1 rd.luks=0 rd.lvm=0 rd.md=0 rd.dm=0 rd.systemd.gpt_auto=no modprobe.blacklist=acpi_cpufreq idle=halt tsc=unstable mitigations=off nowatchdog nmi_watchdog=0 module_blacklist=tpm log_buf_len=4M amd_iommu=off amdgpu.gttsize=8128 spi_amd.speed_dev=1 audit=0 fbcon=rotate:1 loglevel=3 splash quiet plymouth.ignore-serial-consoles fbcon=vc:4-6 steamos.efi=PARTUUID=8feffa0c-e2b4-af4c-960f-815938ad72d5

wallentx avatar Jan 01 '24 02:01 wallentx

Hello,

Since commit 62e8372a9ed311f979efa201a2b91652e12d4ff2 you can try the Memory Controller decoding and post here the output of /corefreq-cli -M


Also looking at specs of K3LK7K70BM-BGCP, do we have to understand than this Zen2 architecture is LP-DDR5 compatible !

cyring avatar Jan 01 '24 09:01 cyring

/etc/default/grub-steamos:GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} amd_iommu=off amdgpu.gttsize=8128 spi_amd.speed_dev=1 audit=0 fbcon=vc:2-6"

Could this explain why the PCI ids doesn't list an IOMMU controller ?

For example within my Matisse, lspci -nn returns an IOMMU at PCI 00:00.2

00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]

May be Van Gogh does definitively not have an IOMMU.

cyring avatar Jan 01 '24 09:01 cyring

May be Van Gogh does definitively not have an IOMMU.

(deck@steamdeck ~)$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Root Complex [1022:1645]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] VanGogh IOMMU [1022:1646]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647]
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647]
00:01.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh PCIe GPP Bridge [1022:1647]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh Internal PCIe GPP Bridge to Bus [1022:1648]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 0 [1022:1660]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 1 [1022:1661]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 2 [1022:1662]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 3 [1022:1663]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 4 [1022:1664]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 5 [1022:1665]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 6 [1022:1666]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] VanGogh Data Fabric; Function 7 [1022:1667]
01:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation Device [1987:5021] (rev 01)
02:00.0 SD Host controller [0805]: O2 Micro, Inc. SD/MMC Card Reader Controller [1217:8621] (rev 01)
03:00.0 Network controller [0280]: Realtek Semiconductor Co., Ltd. RTL8822CE 802.11ac PCIe Wireless Network Adapter [10ec:c822]
04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] VanGogh [AMD Custom GPU 0405] [1002:163f] (rev ae)
04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
04:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP [1022:1649]
04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB2 [1022:162c]
04:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] VanGogh USB1 [1022:163b]
04:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 50)
05:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a] (rev 61)
06:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]

it is disabled in bios by default

wallentx avatar Jan 01 '24 09:01 wallentx

Hello,

Since commit 62e8372 you can try the Memory Controller decoding and post here the output of /corefreq-cli -M

corefreq-cli -M

                              Zen UMC  [1660]                              
Controller #0                                                    Disabled 

wallentx avatar Jan 01 '24 10:01 wallentx

Since commit ac246cdc7cd1bee2dde1218791e64c4f6fcc1f02 you are getting:

  1. a change for the SMU thermal register SMU_AMD_THM_TCTL_REGISTER_F17H (0x00059800)
  2. the IOMMU probing which should revealed its version in the UI window Technologies/Virtualization

cyring avatar Jan 01 '24 10:01 cyring

Hello, Since commit 62e8372 you can try the Memory Controller decoding and post here the output of /corefreq-cli -M

corefreq-cli -M

                              Zen UMC  [1660]                              
Controller #0                                                    Disabled 

Can you try ./zencli umc

cyring avatar Jan 01 '24 10:01 cyring

./zencli umc

:slot_machine:


Data Fabric: scanning UMC @ BAR[0x00050000] : 0 1 2 3 4 5 6 7 for 4 Channels

CHA[0] CHIP[0:0] @ 0x00050000[0x00000001] Enable, Rank=1
CHA[0] MASK[0:0] @ 0x00050020[0x00fffffe] ChipSize[2097152]
CHA[0] CHIP[0:1] @ 0x00050010[0x00000000] Disable, Rank=0
CHA[0] MASK[0:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[1:0] @ 0x00050004[0x00000000] Disable, Rank=1
CHA[0] MASK[1:0] @ 0x00050020[0x00fffffe]
CHA[0] CHIP[1:1] @ 0x00050014[0x00000000] Disable, Rank=0
CHA[0] MASK[1:1] @ 0x00050028[0x00000000]
CHA[0] CHIP[2:0] @ 0x00050008[0x00000000] Disable, Rank=0
CHA[0] MASK[2:0] @ 0x00050024[0x00000000]
CHA[0] CHIP[2:1] @ 0x00050018[0x00000000] Disable, Rank=0
CHA[0] MASK[2:1] @ 0x0005002c[0x00000000]
CHA[0] CHIP[3:0] @ 0x00050008[0x00000000] Disable, Rank=0
CHA[0] MASK[3:0] @ 0x00050024[0x00000000]
CHA[0] CHIP[3:1] @ 0x00050018[0x00000000] Disable, Rank=0
CHA[0] MASK[3:1] @ 0x0005002c[0x00000000]

DIMM Size[2097152 KB] [2048 MB]

CHA[1] CHIP[0:0] @ 0x00150000[0x00000001] Enable, Rank=1
CHA[1] MASK[0:0] @ 0x00150020[0x00fffffe] ChipSize[2097152]
CHA[1] CHIP[0:1] @ 0x00150010[0x00000000] Disable, Rank=0
CHA[1] MASK[0:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[1:0] @ 0x00150004[0x00000000] Disable, Rank=1
CHA[1] MASK[1:0] @ 0x00150020[0x00fffffe]
CHA[1] CHIP[1:1] @ 0x00150014[0x00000000] Disable, Rank=0
CHA[1] MASK[1:1] @ 0x00150028[0x00000000]
CHA[1] CHIP[2:0] @ 0x00150008[0x00000000] Disable, Rank=0
CHA[1] MASK[2:0] @ 0x00150024[0x00000000]
CHA[1] CHIP[2:1] @ 0x00150018[0x00000000] Disable, Rank=0
CHA[1] MASK[2:1] @ 0x0015002c[0x00000000]
CHA[1] CHIP[3:0] @ 0x00150008[0x00000000] Disable, Rank=0
CHA[1] MASK[3:0] @ 0x00150024[0x00000000]
CHA[1] CHIP[3:1] @ 0x00150018[0x00000000] Disable, Rank=0
CHA[1] MASK[3:1] @ 0x0015002c[0x00000000]

DIMM Size[2097152 KB] [2048 MB]

CHA[2] CHIP[0:0] @ 0x00250000[0x00000001] Enable, Rank=1
CHA[2] MASK[0:0] @ 0x00250020[0x00fffffe] ChipSize[2097152]
CHA[2] CHIP[0:1] @ 0x00250010[0x00000000] Disable, Rank=0
CHA[2] MASK[0:1] @ 0x00250028[0x00000000]
CHA[2] CHIP[1:0] @ 0x00250004[0x00000000] Disable, Rank=1
CHA[2] MASK[1:0] @ 0x00250020[0x00fffffe]
CHA[2] CHIP[1:1] @ 0x00250014[0x00000000] Disable, Rank=0
CHA[2] MASK[1:1] @ 0x00250028[0x00000000]
CHA[2] CHIP[2:0] @ 0x00250008[0x00000000] Disable, Rank=0
CHA[2] MASK[2:0] @ 0x00250024[0x00000000]
CHA[2] CHIP[2:1] @ 0x00250018[0x00000000] Disable, Rank=0
CHA[2] MASK[2:1] @ 0x0025002c[0x00000000]
CHA[2] CHIP[3:0] @ 0x00250008[0x00000000] Disable, Rank=0
CHA[2] MASK[3:0] @ 0x00250024[0x00000000]
CHA[2] CHIP[3:1] @ 0x00250018[0x00000000] Disable, Rank=0
CHA[2] MASK[3:1] @ 0x0025002c[0x00000000]

DIMM Size[2097152 KB] [2048 MB]

CHA[3] CHIP[0:0] @ 0x00350000[0x00000001] Enable, Rank=1
CHA[3] MASK[0:0] @ 0x00350020[0x00fffffe] ChipSize[2097152]
CHA[3] CHIP[0:1] @ 0x00350010[0x00000000] Disable, Rank=0
CHA[3] MASK[0:1] @ 0x00350028[0x00000000]
CHA[3] CHIP[1:0] @ 0x00350004[0x00000000] Disable, Rank=1
CHA[3] MASK[1:0] @ 0x00350020[0x00fffffe]
CHA[3] CHIP[1:1] @ 0x00350014[0x00000000] Disable, Rank=0
CHA[3] MASK[1:1] @ 0x00350028[0x00000000]
CHA[3] CHIP[2:0] @ 0x00350008[0x00000000] Disable, Rank=0
CHA[3] MASK[2:0] @ 0x00350024[0x00000000]
CHA[3] CHIP[2:1] @ 0x00350018[0x00000000] Disable, Rank=0
CHA[3] MASK[2:1] @ 0x0035002c[0x00000000]
CHA[3] CHIP[3:0] @ 0x00350008[0x00000000] Disable, Rank=0
CHA[3] MASK[3:0] @ 0x00350024[0x00000000]
CHA[3] CHIP[3:1] @ 0x00350018[0x00000000] Disable, Rank=0
CHA[3] MASK[3:1] @ 0x0035002c[0x00000000]

DIMM Size[2097152 KB] [2048 MB]

:money_mouth_face:

wallentx avatar Jan 01 '24 10:01 wallentx

Because DIMM are of DDR5, we will try another UMC query. Based on latest commit:

  1. edit corefreqk.h https://github.com/cyring/CoreFreq/blob/ac246cdc7cd1bee2dde1218791e64c4f6fcc1f02/x86_64/corefreqk.h#L3171
  2. change for the Rambrandt decoder
	{
		PCI_VDEVICE(AMD, DID_AMD_17H_JUPITER_DF_UMC),
		.driver_data = (kernel_ulong_t) AMD_DataFabric_Rembrandt
	},
  1. Save, rebuild, run and please post corefreq-cli -M

cyring avatar Jan 01 '24 10:01 cyring

  • In a Term of the window manager

2024-01-01_04-52

wallentx avatar Jan 01 '24 10:01 wallentx

Because DIMM are of DDR5, we will try another UMC query. Based on latest commit:

  1. edit corefreqk.h https://github.com/cyring/CoreFreq/blob/ac246cdc7cd1bee2dde1218791e64c4f6fcc1f02/x86_64/corefreqk.h#L3171
  2. change for the Rambrandt decoder
	{
		PCI_VDEVICE(AMD, DID_AMD_17H_JUPITER_DF_UMC),
		.driver_data = (kernel_ulong_t) AMD_DataFabric_Rembrandt
	},
  1. Save, rebuild, run and please post corefreq-cli -M

hmm no change.

journal logs https://gist.github.com/wallentx/58ff422e80b21a13d822909f9ea414bb

wallentx avatar Jan 01 '24 11:01 wallentx

CPU #0   function         EAX          EBX          ECX          EDX            
|- 80000008:00000000    00003030     090cf657     00007007     00010000         

Availability of Hardware CPPC Registers is confirmed by bit 27 of EBX

We are dumping 0x090cf657

# EBX and CPPC mask
1001000011001111011001010111
1000000000000000000000000000

So the platform is capable of Hardware CPPC which is called HWP or Speed-Shift by Intel; in case if BIOS mentioned as disabled ?

As you will read in the function AMD_F17h_CPPC() https://github.com/cyring/CoreFreq/blob/ac246cdc7cd1bee2dde1218791e64c4f6fcc1f02/x86_64/corefreqk.c#L3952 when CPUID feature is confirmed we can safely read/write the associated MSR registers You can do the same on command line with the ArchLinux msr-tools

rdmsr -x 0xc00102b1 ## Enable state is at bit zero

cyring avatar Jan 01 '24 11:01 cyring

hmm no change.

journal logs https://gist.github.com/wallentx/58ff422e80b21a13d822909f9ea414bb

It's hard to debug from here; the fact that zencli umc found some meaningful values ...

cyring avatar Jan 01 '24 11:01 cyring

I will continue this tomorrow and post results :sleeping:

Happy 2024! :champagne: :tada:

wallentx avatar Jan 01 '24 11:01 wallentx

Pre-release on going if you wish to complete your issue. https://github.com/cyring/CoreFreq/discussions/472

cyring avatar Jan 18 '24 07:01 cyring