variorum icon indicating copy to clipboard operation
variorum copied to clipboard

Add an initial Intel Celeron 069C port

Open sanmai-NL opened this issue 1 year ago • 17 comments

Unclear fault message. I can think of various causes, but wanted to file this to get this fault message documented and/or clarified in the UI of powmon itself.

A build of today, 65cb28e2596a29f6f7fbf6d6432c23d4f7a6a9c5.

sanmai@server4 /d/s/v/i/bin (dev)> ./powmon -a /data/sanmai/ntpd-rs/target/miri/x86_64-unknown-linux-gnu/debug/ntp-ctl 
Trace and summary files will be dumped in ./
Using sampling interval of: 50 ms
Profiling: /data/sanmai/ntpd-rs/target/miri/x86_64-unknown-linux-gnu/debug/ntp-ctl
Fork failure
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
(null):/data/sanmai/variorum/src/variorum/config_architecture.c:variorum_enter():100: _ERROR_VARIORUM_UNSUPPORTED_PLATFORM: Cannot set function pointers
Output Files:
  server4.powmon.dat
  server4.powmon.summary

Removing named semaphore /power_wrapperK
Removing named semaphore /power_wrapperL

sanmai@server4 /d/s/v/install (dev)> lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm

sanmai@server4 /d/s/v/install (dev)> uname -a
Linux server4 6.1.0-0.deb11.5-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.12-1~bpo11+1 (2023-03-05) x86_64 GNU/Linux

sanmai-NL avatar Apr 25 '23 09:04 sanmai-NL

Hi @sanmai-NL :

Please check supported architecture for each of the APIs here: https://variorum.readthedocs.io/en/latest/api/print_functions.html.

Powmon uses variorum_monitoring internally, and is not supported on the Intel DGPU yet. The error message clearly highlights this, saying that the platform is unsupported. We should document what powmon supports better, thanks for pointing this out.

Happy to accept a port from your end if you want to contribute ;)

tpatki avatar Apr 25 '23 09:04 tpatki

It highlights the lack of support through the flag name (normally a technical detail), while the function pointer stuff could as well refer to properties of the executable under analysis or of the operating system environment.

I'm not trying to use a DGPU and I'm not working in an HPC environment proper, just a regular Intel CPU:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   39 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Celeron(R) N5105 @ 2.00GHz
CPU family:                      6
Model:                           156

sanmai-NL avatar Apr 25 '23 09:04 sanmai-NL

@sanmai-NL

Thanks for clarifying: your "build" link earlier pointed to the Intel DGPU port, so wasn't sure which architecture you were trying. But in the same vein, Intel family 6, model 156 (9C) is unsupported at the moment. This is also not listed as a server processor.

Having said that, it should be fairly straightforward to add an initial port for Celeron, if we can find the MSRs and if you are willing to test for us (we don't have access to a Celeron node to test). Tagging @slabasan for her thoughts here.

tpatki avatar Apr 25 '23 09:04 tpatki

I'm willing to test. RAPL is supported, through the Linux Power Capping Framework, but interestingly not through some other tools.

sanmai-NL avatar Apr 25 '23 09:04 sanmai-NL

@sanmai-NL

Sounds good, thanks, we will work on an initial port, at least to print/profile power and support some basic registers.

Are you able to use msr-tools at your end (https://github.com/intel/msr-tools), and do you have access to the registers (some of them need privileged access)? We recommend msr-safe, but we should be able to help you build with the traditional msr kernel module as well.

tpatki avatar Apr 25 '23 09:04 tpatki

I used rdmsr on that system before. I have full access. The msr and intel_rapl_msr kernel modules are already loaded. I built msr-safe, but at this time I question its applicability. How can this module be kept in sync with the kernel? How can I be sure that this module won't itself have negative effects on the host OS? Do I have more trust in existing security controls and frameworks in combination with a maintained kernel modules, versus in this custom module that claims to improve security? To use this and to gain value from it in the first place, a cooperation with a system administator role is essential. And that doesn't really match my environment right now.

sanmai-NL avatar Apr 25 '23 10:04 sanmai-NL

@sanmai-NL

You can use the regular msr module if you prefer, I am not here to convince you one way or other. I can share that we have deployed msr-safe on many of or supercomputers at LLNL, and in other supercomputing centers. But yes, you will need some admin help to set that up and create an allowlist.

Without write (you can check with wrmsr) permissions on the MSR_PKG_POWER_LIMIT or MSR_DRAM_POWER_LIMIT, you won't be able to set power caps. You can certainly read and gather data based on which MSRs you can access with rdmsr.

  • I am having some trouble finding the offsets for the MSRs on your processor in Intel docs, do you have it handy?

  • If not, I am going to assume that the addresses for some of the basic TSC/RAPL MSRs are similar. Can you check the following address offsets for me if possible?

    .msr_platform_info            = 0xCE,
    .ia32_time_stamp_counter      = 0x10,
    .msr_rapl_power_unit          = 0x606,
    .msr_pkg_power_limit          = 0x610,
    .msr_pkg_energy_status        = 0x611,
    .msr_pkg_power_info           = 0x614,
    .msr_dram_power_limit         = 0x618,
    .msr_dram_energy_status       = 0x619,
    .msr_dram_power_info          = 0x61C,

tpatki avatar Apr 25 '23 11:04 tpatki

I also found the Intel docs lack some information, at least regarding the RAPL MSRs for this CPU type. Will look into it soon.

sanmai-NL avatar Apr 26 '23 10:04 sanmai-NL

# printf '%s\n' 0xCE 0x10 0x606 0x610 0x611 0x614 0x618 0x619 0x61C | xargs -n 1 --verbose rdmsr
rdmsr 0xCE
40830f0811400
rdmsr 0x10
3186d418e9456
rdmsr 0x606
a0e03
rdmsr 0x610
4280c800dd8050
rdmsr 0x611
d262d06a
rdmsr 0x614
50
rdmsr 0x618
5400de00000000
rdmsr 0x619
0
rdmsr 0x61C
rdmsr: CPU 0 cannot read MSR 0x0000061c

sanmai-NL avatar Apr 26 '23 15:04 sanmai-NL

I've consulted Intel® 64 and IA-32 Architectures Software Developer’s Manual, Vol. 4, Section 2.7.

sanmai-NL avatar Apr 26 '23 15:04 sanmai-NL

Perhaps you should refer to Table 2-12 for the RAPL MSRs. It is one possible reading of the last row in Table 2-14:

See Table 2-6, Table 2-12, Table 2-13, and Table 2-14 for MSR definitions applicable to processors with CPUID signature 06_86H.

(Emphasis mine.)

sanmai-NL avatar Apr 26 '23 15:04 sanmai-NL

@tpatki Can I assist you any further?

sanmai-NL avatar May 09 '23 10:05 sanmai-NL

@slabasan I'm not sure which MSR addresses to use and didn't find good documentation. This is not a server processor, it is a mobile processor (Atom). The Vol 4, Sec 2 on MSRs didn't have anything for Celeron. There is Table 2.12 (Pg 4664) that @sanmai-NL referenced for Atom processors based on Goldmont, but I'm not sure if these addresses apply to the Celeron (he wasn't able to read 0x61C for DRAM_POWER_INFO) in his tests. I could use some help with verifying addresses, pulling together an initial port at least for reading/capping power should be easy after that.

tpatki avatar May 09 '23 17:05 tpatki

What can I do about this? Any contact person at Intel who could shed a light?

sanmai-NL avatar May 25 '23 12:05 sanmai-NL

Hi @sanmai-NL : I don't know anyone at Intel who works on the Atom processor side, and the server side folks may not know. One thing we can do is I can give you the MSRs that do seem to work, which are for PKG_POWER. Without DRAM_POWER_INFO, I'm not sure we know the min/max range for the power cap, so not sure if it makes sense to add the DRAM ones.

tpatki avatar May 25 '23 12:05 tpatki

@tpatki I'm still keen to move forward on this ... What can I do to give you the time/space to fix this or to fix it myself?

sanmai-NL avatar Feb 01 '24 10:02 sanmai-NL

@sanmai-NL Unfortunately our funding and priorities have shifted, and we will be unable to implement variorum for this architecture. We invite you to implement and test the port at your end, and we can review once we have met our other priorities. Thanks for your patience and understanding.

slabasan avatar Feb 01 '24 19:02 slabasan