pcp icon indicating copy to clipboard operation
pcp copied to clipboard

Perfevent configuration error for AMD chips

Open Osmanyasal opened this issue 1 year ago • 10 comments

Hello we're using the performance copilot as a tool in one of our projects but we're having some issues related to monitoring AMD pmu events. I'd be very happy if you save some time and help me out with this issue.

we list all the available pmu in our machines and we generally use "hardware-specific" PMUs to monitor some predefined events. we're updating /var/lib/pcp/pmdas/perfevent/perfevent.conf file and reinstalling it with our configuration. this works well for intel cpus but it doesn't work for amd cpus. here are the details

this is on our intel machine, the perfevent.conf file and when we install it it works image This is one of our amd machines, as you can see it gives errors. image with PCP we list all available pmus along with their available events

This is for intel image And this is for the amd

image ps: kernel paranoid is -1 on all of my machines

I can monitor [perf] events with success on both computers but this is not what we're interested in. My opinion is amd pmu names are not recognized by PCP but we couldn't fix it.

thanks in advance. Osman

Osmanyasal avatar May 05 '23 18:05 Osmanyasal

@jpwhite4 @hkshaw1990 any clues for our friend here?

natoscott avatar May 08 '23 04:05 natoscott

hello again we have been trying on different amd machines with different architectures but still no good do you have any updates regarding this issue?

Osmanyasal avatar May 14 '23 09:05 Osmanyasal

@Osmanyasal seems like not - if you could provide a remote login to such a system, I could take a quick look for you.

natoscott avatar May 14 '23 22:05 natoscott

I don't think i can because they're our school's computers. If you can describe us a starting point we can check in order to understand what's wrong. @FatihTasyaran

Osmanyasal avatar May 15 '23 08:05 Osmanyasal

@Osmanyasal I was able to get a reservation an AMD machine today.

You'll find the list of supported names for your platform gets reported by the PCP perfevent agent in the file /var/log/pcp/pmcd/perfevent.log once its been ./Install'd for the first time.

You should be able to find the events you're interested in there and add them to a new section of perfevent.conf for your processor family (in /var/lib/pcp/pmdas/perfevent). I had no problems doing so with latest PCP code, so hopefully this is enough to get you started too.

cheers.

natoscott avatar May 18 '23 10:05 natoscott

That's great it works now. but the issue is. we use showevtinfo (program in pcp) to report pmu names along with corresponding events and this tool reports pmu name as "amd64_fam17h_zen2 (AMD64 Fam17h Zen2)" so we took the first part as out pmu name but in log files it supports amd64_fam17h only (without _zen2).

Osmanyasal avatar May 18 '23 10:05 Osmanyasal

however, when i checked the log file located at /var/log/pcp/pmcd/perfevent.log for our zen3 machine it only supports perf:: events. there's no other pmus such as amd64_fam19h. but when i checked the showevtinfo it says machine supports amd64_fam19h and there're many events related to. I installed pcp version 6.0.4-1 what could be the issue here?

Osmanyasal avatar May 18 '23 10:05 Osmanyasal

| [...] showevtinfo (program in pcp)

This isn't a program from PCP, so I don't know what it is listing. The PMDA logfile is the one source of truth for PCP, those are all the hardware events that the kernel tells us about.

| [...] what could be the issue here?

The only other possible thing that might be involved would be a security system like SELinux - it might be preventing events being visible from a daemon (like pmdaperfevent) that are visible in a less restricted context like an interactive shell.

Either way, I don't think there's a PCP issue here (we regularly test with selinux here @Red Hat and there's no known issues).

natoscott avatar May 18 '23 22:05 natoscott

Sorry for my misleading previous entry. showevtinfo is a demo program provided by libpfm4 that lists all available pmus and related events on the system.

since pcp uses libpfm4 for pmu event monitoring (as far as i know) i expect anything reported form libpfm4 should be valid for pcp as well -which it is.

I set kernel.paranoid to -1 to see and report pmu events. all these works for zen2 but didn't work on our zen3 machine, pcp doesn't display any pmus other than perf on our zen3 machine which i couldn't see why.

would you elaborate this phrase "The only other possible thing that might be involved would be a security system like SELinux - it might be preventing events being visible from a daemon (like pmdaperfevent) that are visible in a less restricted context like an interactive shell."

any breadcrumbs would be appreciated thank you in advance Osman.

Osmanyasal avatar May 18 '23 22:05 Osmanyasal

If it was an selinux issue (unlikely) when you look in your syslog file you would see lots of AVC errors when pmdaperfevent attempts access via the kernel interface.

natoscott avatar May 18 '23 22:05 natoscott