zenpower icon indicating copy to clipboard operation
zenpower copied to clipboard

Add support for Ryzen 4000 APUs

Open ocerman opened this issue 4 years ago • 32 comments

original problem: ocerman/zenmonitor#27

  • [ ] Die Temperatures
  • [ ] CCD Temperatures
  • [ ] Voltages
  • [ ] Current

ocerman avatar Jun 06 '20 17:06 ocerman

As commented on ocerman/zenmonitor#27 I seem to get all the relevant data on my 4800H, probably through MSR (I didn't install your zenpower module ATM, I just discovered it).

Cheaterman avatar Jul 11 '20 11:07 Cheaterman

(As a follow-up question, is there anything I need zenpower for, that's not already covered by ACPI/P-states, MSR and friends?)

Cheaterman avatar Jul 11 '20 11:07 Cheaterman

I've been lying, sorry ; I don't get direct access to voltages or currents, see attached screenshot. :-)

zenmon

Cheaterman avatar Jul 11 '20 11:07 Cheaterman

@ocerman Have you figured out the SVI2 addresses on Renoir? I've asked a user to scan the usual offsets, but they are empty with the exception 0x0005A08C: 0x00000008

########################################################
SVI2: PCI Range
########################################################
0x0005A000: 0x00000000
0x0005A004: 0x00000000
0x0005A008: 0x00000000
0x0005A00C: 0x00000000
0x0005A010: 0x00000000
0x0005A014: 0x00000000
0x0005A018: 0x00000000
0x0005A01C: 0x00000000
0x0005A020: 0x00000000
0x0005A024: 0x00000000
0x0005A028: 0x00000000
0x0005A02C: 0x00000000
0x0005A030: 0x00000000
0x0005A034: 0x00000000
0x0005A038: 0x00000000
0x0005A03C: 0x00000000
0x0005A040: 0x00000000
0x0005A044: 0x00000000
0x0005A048: 0x00000000
0x0005A04C: 0x00000000
0x0005A050: 0x00000000
0x0005A054: 0x00000000
0x0005A058: 0x00000000
0x0005A05C: 0x00000000
0x0005A060: 0x00000000
0x0005A064: 0x00000000
0x0005A068: 0x00000000
0x0005A06C: 0x00000000
0x0005A070: 0x00000000
0x0005A074: 0x00000000
0x0005A078: 0x00000000
0x0005A07C: 0x00000000
0x0005A080: 0x00000000
0x0005A084: 0x00000000
0x0005A088: 0x00000000
0x0005A08C: 0x00000008
0x0005A090: 0x00000000
0x0005A094: 0x00000000
0x0005A098: 0x00000000
0x0005A09C: 0x00000000
0x0005A0A0: 0x00000000
0x0005A0A4: 0x00000000
0x0005A0A8: 0x00000000
0x0005A0AC: 0x00000000
0x0005A0B0: 0x00000000
0x0005A0B4: 0x00000000
0x0005A0B8: 0x00000000
0x0005A0BC: 0x00000000
0x0005A0C0: 0x00000000
0x0005A0C4: 0x00000000
0x0005A0C8: 0x00000000
0x0005A0CC: 0x00000000
0x0005A0D0: 0x00000000
0x0005A0D4: 0x00000000
0x0005A0D8: 0x00000000
0x0005A0DC: 0x00000000
0x0005A0E0: 0x00000000
0x0005A0E4: 0x00000000
0x0005A0E8: 0x00000000
0x0005A0EC: 0x00000000
0x0005A0F0: 0x00000000
0x0005A0F4: 0x00000000
0x0005A0F8: 0x00000000
0x0005A0FC: 0x00000000

As for SMU PowerTable, that's what I've figured out so far for the needs of my app (not 100% sure, but seems to work):

[Serializable]
[StructLayout(LayoutKind.Explicit)]
private struct PowerTableAPU1
{
	[FieldOffset(0x144)] public uint Fclk;
	[FieldOffset(0x154)] public uint Uclk;
	[FieldOffset(0x164)] public uint Mclk;
	[FieldOffset(0x198)] public uint VddcrSoc;
};

irusanov avatar Aug 18 '20 21:08 irusanov

SVI2 for Renoir seems to be

public const uint F17H_M02H_SVI = 0x0006F000;
public const uint F17H_M60H_SVI_TEL_PLANE0 = (F17H_M02H_SVI + 0x38);
public const uint F17H_M60H_SVI_TEL_PLANE1 = (F17H_M02H_SVI + 0x3C);

I'm not 100% in the naming of the register base (F17H_M02H_SVI), because I made it up, but a deeper PCI register scan dump from users reveals this is the base address and offsets. Again, not completely sure about offsets, but at least the VSOC seems to work.

irusanov avatar Sep 11 '20 01:09 irusanov

SVI2 for Renoir seems to be

public const uint F17H_M02H_SVI = 0x0006F000;
public const uint F17H_M60H_SVI_TEL_PLANE0 = (F17H_M02H_SVI + 0x38);
public const uint F17H_M60H_SVI_TEL_PLANE1 = (F17H_M02H_SVI + 0x3C);

I'm not 100% in the naming of the register base (F17H_M02H_SVI), because I made it up, but a deeper PCI register scan dump from users reveals this is the base address and offsets. Again, not completely sure about offsets, but at least the VSOC seems to work.

Is that your own code?

Also from what I found, the ZEN SVI register ( base ) is always 0x0005A000. PLANE0/1 seem to be like this for >=ZEN2

Server/TR parts:

PLANE0 = (SVI_BASE + 0x14);
PLANE1 = (SVI_BASE + 0x10);

Ryzen Desktop, APU Desktop, APU Mobile:

PLANE0 = (SVI_BASE + 0x10);
PLANE1 = (SVI_BASE + 0xC);

ZEN1/ZEN+ seems to have the PLANE0/1 swapped, but ZEN1 is a mess anyway.

I don't have any Renoir box to test that, but if someone is willing to test, a patch could be added to zenpower to play around.

abucodonosor avatar Dec 25 '20 01:12 abucodonosor

@abucodonosor Yes, SVI2 base address on desktop, server and mobile SKUs seems to be 0x0005A000. The only exception so far is Renoir with a base address of 0x0006F000, but I expect future APUs to have the same base address. I found that with reports from users which ran SMUDebugTool on their systems.

I'm the author of ZenTimings and reading VSOC seems to be working fine for Renoir. Not completely sure about the other plane, but it can be found experimentally.

Sample dump from desktop Renoir system, but I think it is the same for mobile.

######################################################
SVI2: PCI Range
######################################################
0x0006F000: 0x00000000
0x0006F004: 0x00000010
0x0006F008: 0x00000010
0x0006F00C: 0x000186A0
0x0006F010: 0x00000010
0x0006F014: 0x000186A0
0x0006F018: 0x0000003C
0x0006F01C: 0x00000000
0x0006F020: 0x00000000
0x0006F024: 0x0000010E
0x0006F028: 0x0000000E
0x0006F02C: 0x00000000
0x0006F030: 0x00730001
0x0006F034: 0x00000002
0x0006F038: 0x012A001F
0x0006F03C: 0x0172001D
0x0006F040: 0x00000000
0x0006F044: 0x00000100
0x0006F048: 0x0000FF00
0x0006F04C: 0x00000000
0x0006F050: 0x00000000
0x0006F054: 0x00000000
0x0006F058: 0x00000000
0x0006F05C: 0x2B000000
0x0006F060: 0x73000000
0x0006F064: 0x00000000
0x0006F068: 0x00000000
0x0006F06C: 0x00000303
0x0006F070: 0x00000003
0x0006F074: 0x00000000
0x0006F078: 0x80000002
0x0006F07C: 0x80000002
0x0006F080: 0x80000041
0x0006F084: 0x00000000
0x0006F088: 0x00000000
0x0006F08C: 0x00000000
0x0006F090: 0x00000000
0x0006F094: 0x00000000
0x0006F098: 0x00000000
0x0006F09C: 0x00000000
0x0006F0A0: 0x00000000
0x0006F0A4: 0x01FF00FF
0x0006F0A8: 0x00000000
0x0006F0AC: 0x00000000
0x0006F0B0: 0x00000000
0x0006F0B4: 0x00000000
0x0006F0B8: 0x01FF00FF
0x0006F0BC: 0x00000000
0x0006F0C0: 0x00000001
0x0006F0C4: 0x00000001
0x0006F0C8: 0x00000000
0x0006F0CC: 0x00000000
0x0006F0D0: 0x00000000
0x0006F0D4: 0x00000000
0x0006F0D8: 0x00000000
0x0006F0DC: 0x00000000
0x0006F0E0: 0x00000000
0x0006F0E4: 0x00000000
0x0006F0E8: 0x00000000
0x0006F0EC: 0x00000000
0x0006F0F0: 0x00000000
0x0006F0F4: 0x00000000
0x0006F0F8: 0x00000000
0x0006F0FC: 0x00000000

PS: Actually that's a mobile CPU, so yes - it works on both desktop and mobile SKUs.

######################################################
System Info
######################################################
OS: Microsoft Windows 10 Enterprise
CpuName: AMD Ryzen 7 4700U with Radeon Graphics
CodeName: Renoir
CpuId: 00860F01
Model: 96
ExtendedModel: 96
PackageType: 0
FusedCoreCount: 8
PhysicalCoreCount: 0
NodesPerProcessor: 1
Threads: 8
SMT: False
CCDCount: 0
CCXCount: 0
NumCoresInCCX: 0
MbVendor: HP
MbName: 8730
BiosVersion: S79 Ver. 01.03.01
SmuVersion: 55.69.00
SmuTableVersion: 00370005
PatchLevel: 08600106

irusanov avatar Dec 25 '20 18:12 irusanov

I'm the author of ZenTimings

Duh, I suck, how did I miss that ?:)

@irusanov

Looks like you are correct about the base address, and yes we could experiment with the PLANEs to figure out, we just need someone with a Renoir box who is willing to test :)

@Cheaterman maybe you?

abucodonosor avatar Dec 25 '20 21:12 abucodonosor

I still have a Ryzen 7 4800H laptop I can test with, yes! Do tell which branch of which repo to clone and I'll give it a try! :-)

Cheaterman avatar Dec 26 '20 09:12 Cheaterman

@Cheaterman

There isn't a git branch yet, but I've made a patch with the values @irusanov found. From there we need to experiment until we find the right PLANE0/1 values.

Pull the zenpower git repo and apply :

https://crazy.dev.frugalware.org/Zen2-Renoir-test.patch

To change the PLANE address is easy, in zenpower.c there will be this:

#define F17H_RN_SVI_TEL_PLANE0          (F17H_RN_SVI + 0x38)
#define F17H_RN_SVI_TEL_PLANE1          (F17H_RN_SVI + 0x3C)

Say we want to test Desktop offsets then we just need to change PLANE0 from 0x38 to 0x10 PLANE1 from 0x3C to 0xC, etc until we find what works.

Thank you for doing that :)

abucodonosor avatar Dec 26 '20 12:12 abucodonosor

Thanks, the offsets make sense given I'm on non-server! Just to be clear, what am I supposed to expect if it does/doesn't work? Testing tonight :-)

Cheaterman avatar Dec 26 '20 15:12 Cheaterman

@Cheaterman

Well, is about the SVI base address in the first place, looks to be different from any other generation/model.

If that works as is you should be able to see SVI2 Core/SoC voltage etc.

abucodonosor avatar Dec 26 '20 15:12 abucodonosor

@abucodonosor The offsets on the desktop are the same. I have dumps from both desktop and mobile Renoir.

irusanov avatar Dec 26 '20 15:12 irusanov

@irusanov

OK. I've made the first patch with your findings, so we'll see if that works as soon @Cheaterman have a chance to test it.

BTW, are all Renoir SKUs model 0x60?

abucodonosor avatar Dec 26 '20 15:12 abucodonosor

Yes, family 0x17, model 0x60. https://en.wikichip.org/wiki/amd/cpuid#Family_23_.2817h.29

They only differ in package type, I think.

irusanov avatar Dec 26 '20 19:12 irusanov

Hi, sorry for the delay! Holidays :smile:

I'm again not quite clear as to what I'm looking for - you'll see I have SVI2 power and current files in sysfs in both cases (logs attached). If anything, the values seem more correct with 0x38 + 0x3C, unless the units are different? (EDIT: That's actually a very bold assumption, I have no idea what correct values should look like...) I have the exact same number (and names) of sysfs files with both versions of the module by the way (forgot to show that in the logs).

Keep me in touch if you need more values tested :smile:

logs.txt

(note the sanity check :sweat_smile: 17h == 23, right?)

Cheaterman avatar Dec 29 '20 12:12 Cheaterman

@Cheaterman

Just stress the CPU a bit, and run sensors command in a terminal. That should give us some better idea.

abucodonosor avatar Dec 29 '20 17:12 abucodonosor

@abucodonosor

Great idea indeed - seems to confirm 0x10 and 0x0C give bogus data, as I suspected above:

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V  
SVI2_SoC:      1.54 V  
Tdie:         +44.9°C  (high = +95.0°C)
Tctl:         +44.9°C  
SVI2_P_Core:  16.34 W  
SVI2_P_SoC:   72.70 W  
SVI2_C_Core:  10.54 A  
SVI2_C_SoC:   47.09 A  

This is permanent - regardless of load.

On the other hand, if I'm using 0x38 + 0x3C, this is what I get when idling:

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   757.00 mV 
SVI2_SoC:    894.00 mV 
Tdie:         +47.8°C  (high = +95.0°C)
Tctl:         +47.8°C  
SVI2_P_Core:   3.40 W  
SVI2_P_SoC:    6.31 W  
SVI2_C_Core:   4.61 A  
SVI2_C_SoC:    7.06 A  

And under load:

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   913.00 mV 
SVI2_SoC:    894.00 mV 
Tdie:         +60.5°C  (high = +95.0°C)
Tctl:         +60.5°C  
SVI2_P_Core:  37.89 W  
SVI2_P_SoC:    7.37 W  
SVI2_C_Core:  41.51 A  
SVI2_C_SoC:    8.24 A  

So, looks like 0x38 and 0x3C work better than 0x10 and 0x0C. If you have thoughts, please let me know :smile: I hope the testing was satisfactory!

Cheaterman avatar Dec 30 '20 09:12 Cheaterman

Hey, quick question: What is the connection between the SMU, SMUDebugTool and SMN registers? From what I understand right now, the SMU is a Freescale microcontroller inside the CPU with which one can communicate by sending commands trough SMN registers and collecting the response. But zenpower does not send commands. It only reads registers that seem to be updated automatically either by hardware or possibly ring0? What is the exact connetion to SMUDebugTool? Or are you just using it to poke some registers without actually communicating with the SMU?

@Cheaterman

EDIT: That's actually a very bold assumption, I have no idea what correct values should look like...

I recently managed to reverse engineer the PM table for Ryzen 5000. Values in there are already scaled to SI units and of higher resolution. Now I'm trying to correlate those values to what zenpower reports, to get that one working as well. Maybe a simmilar approach is possible for Ryzen 4000 as well? EDIT: In hindsight, I think @irusanov already had the same approach a few comments further up.

For reference, I attached a screenshot of what information I could pull from the PM table of my system. I found the reported values to be very stable, reproducible and could confirm they match the values I got from measuring individual power planes & currents on my mainboad. So I'm very confident those values are reasonable accurate and could be a solid base for verifying zenpower readings, given we manage to find the corresponding fields in the PM table of the 4000 APUs.

PM_table Ryzen 5000

hattedsquirrel avatar Dec 30 '20 13:12 hattedsquirrel

SMUDebugTool (actually much more than just the communication with the SMU coprocessor) was a tool I've quickly coded to help me debug and poke the black box that Ryzen is. Eventually it got much more features over time, so it now has several "subtools" inside.

  • Manual OC settings to force the CPU in overclock mode (this was a test feature, ZenStates has more controls, including voltage)
  • SMU scanner for available mailboxes (Message address/register, Response and Arguments) and can send commands and read the response from those mailboxes (SMN). The SMN protocol is somewhat an extension to the common way you communicate with PCI devices via index and data ports. It just has special registers for read/write operations.
  • Power Table monitor (it is used via the SMN protocol by telling the SMU to send the table to the RAM, then read from a specified address)
  • Mailbox/SMU Monitor which tries to detect sent commands to a specific mailbox from other applications, e.g. Ryzen Master, HWInfo, etc. This way I could find many commands (which differ between generations and sockets). For example the command to "refresh" the Power Table.
  • PCI register read/write and scanner
  • MSR read/write and scanner
  • CPUID Instructions read/write and scanner
  • PStates read/write

For example, here's a running and auto-refreshing power table of the CPU currently installed in my motherboard - 3000G. You can then probably match the reported values with their offsets to the names that e.g. HWinfo displays. Unfortunately need Windows for that.

image

irusanov avatar Dec 30 '20 14:12 irusanov

@hattedsquirrel @irusanov

There is an SMU kernel module for Linux on gitlab.

https://gitlab.com/leogx9r/ryzen_smu

They have an issue open about RN PM table too: https://gitlab.com/leogx9r/ryzen_smu/-/issues/2

Also, there is this:

https://github.com/sbski/Renoir-Mobile-Tuning/blob/master/renoir_tuning_utility/renoir_tuning_utility/SystemMonitor.cs

If you ask me, we should all somehow join to one big project, even if is just one to share data and findings, no matter the operating system bc the data/register etc used are the same.

As for Linux probably all those random all over the place driver should be merged in one or two working ones.

abucodonosor avatar Dec 30 '20 15:12 abucodonosor

@abucodonosor Yes, ryzen_smu is what I used to build that thing I posted the screenshot of. I just don't have any clue where they take the table of symbols from and had to guess which value is which for Ryzen5000.

Joining forces: Yes, absolutely. I'm working on documenting what I did during the past days. Hopefully I did not only redo what somebode else with more knowledge has already done somewhere else... One big driver: For my personal concerns I'm quite happy which what I can get from the PM table, but I still would like to get the SMN access going which zenpower uses, because requesting a full PM table update from the SMU every time someone accesses the "sensors" kernel interface seems a bit wastefull. But maybe there is a more clever way? I still don't understand much of the inner workings after all ;-)

@irusanov What would be a good resource to start learning about how this whole mailbox communication thing and all around it works? Coming from much simplier plattforms I always thought everything gets mapped into the one memory space. But SMN registers are accessed by their own ASM instructions so I imagined it maybe was some sort of second memory space? It all becomes a blur from there on. (And googling for three letter acronyms like SMN, SMU and BAR on their own didn't turn out to be very fruitful so far...)

hattedsquirrel avatar Dec 30 '20 15:12 hattedsquirrel

I don't really know what to recommend. I'm working for a year and half on that stuff and most of it had been trial and error, especially in the beginning.

Don't think there's another way of refreshing the table. All software tools I've seen send the command for refresh (transfer table to DRAM) on a given interval in order to get the current data. RyzenMaster, Hwinfo, ZenTimings, Asus Tool, perhaps many more. Most are refreshing every 2 seconds, but some are spamming the SMU at much shorter polling interval (Asus Tool).

irusanov avatar Dec 30 '20 16:12 irusanov

@hattedsquirrel

Regarding the access to any method, I don't think that matter on Linux. A kernel module will add files to sysfs, updating these all X secs/ms etc, and reading these with some_tool, or update everything once some_tool is used to display updated information.

I tested, that ryzen_smu on 4 boxes here and it worked on exactly no one :-).

abucodonosor avatar Dec 30 '20 23:12 abucodonosor

I tested, that ryzen_smu on 4 boxes here and it worked on exactly no one :-).

Did the kernel module not work at all? Or just the userspace tool (monitor_cpu)?

hattedsquirrel avatar Dec 30 '20 23:12 hattedsquirrel

I tested, that ryzen_smu on 4 boxes here and it worked on exactly no one :-).

Did the kernel module not work at all? Or just the userspace tool (monitor_cpu)?

On 3750H and 3700U is loads but all I get is garbage since SMU version is not supported or alike and I have to use the tool with force. On the dual-socket EPYC box, it won't load, it doesn't support multi-socket at all, and so is trying to create duplicated files. The old 2200G gives some cryptic errors.

abucodonosor avatar Dec 30 '20 23:12 abucodonosor

On 3750H and 3700U is loads but all I get is garbage since SMU version is not supported or alike and I have to use the tool with force.

Same for 5900X. By saying "I had to reverse engineer the PM table" I meant that I had to find out what is at which index. Took me some time but was doable after all.

hattedsquirrel avatar Dec 31 '20 00:12 hattedsquirrel

@hattedsquirrel

Ok :). I'll try to see what I can do next week, I have more CPUs here to test but I'm in the wrong place at the moment, can only test the laptops I have with me here, the old a300 MiniDesk and Server CPUs.

I can try it on a 2600,2700X,3600,3950X, and on a Threadripper 1950X next week and see what I get with these.

TBH, I like the SMU modules output, but it looks like it needs a lot of work.

abucodonosor avatar Dec 31 '20 00:12 abucodonosor

OK, I created an isse over at ryzen_smu sharing the code I've got right now: https://gitlab.com/leogx9r/ryzen_smu/-/issues/5 It is a bit of a mess, but sharing it for future reference might at least be something.

I agree, especially the monitor_cpu tool needs work done. Originally, I wanted to keep the support for the 3700X table in there but then had a between-the-ears-error and didn't see how one would easily do that without a rewrite. If there is interest from the project, I'd be willing to do that.

hattedsquirrel avatar Dec 31 '20 01:12 hattedsquirrel

It's not that easy to support all possible CPUs. It's a little bit easier now with the information we have, but it was much harder 1 year ago.

Every generation has different SMU commands and several versions of the PM table. Some versions differ slightly, some are completely incompatible. Everything depends on AGESA as well and the SMU Firmware in the bios. Desktop AM4 APUs have completely different table than "the same" mobile APUs (FPx socket). It's really hard to support a SKU when you don't have it in hand, and even then you're bound to the motherboard you're using. While working on ZenTimings I've seen all sorts of inconsistencies between different vendors, although the situation is a little bit better with Zen2 and upwards. Zen and Zen+ are a mess.

One recent example - two Picasso desktop APUs on different motherboards. My motherboard uses Zen SVI2 addresses and Raven SMU Firmware, while the other motherboard uses specific Picasso Firmware and switched SVI2 addresses (same as Naples and Whitehaven).

Joined forces would be good, I also have a linux project (ZenStates-Linux, a fork of the original abandoned one), but it's on Python :)

@hattedsquirrel

I agree, especially the monitor_cpu tool needs work done. Originally, I wanted to keep the support for the 3700X table in there but then had a between-the-ears-error and didn't see how one would easily do that without a rewrite. If there is interest from the project, I'd be willing to do that.

You can't reuse tables between different generations. Usually the first part is somewhat consistent, but the rest isn't. The correct table needs to be selected based on the detected version (there is a SMU command for that) and each generation has several such versions. Values seem to be divided in groups and AMD inserts more values in those groups, so all the rest is pushed "down" and offsets change.

irusanov avatar Dec 31 '20 11:12 irusanov