jmh
jmh copied to clipboard
7903740: JMH: Perf event validation not working with skid options
Fixes https://bugs.openjdk.org/browse/CODETOOLS-7903740
Progress
- [x] Change must not contain extraneous whitespace
- [ ] Change must be properly reviewed (1 review required, with at least 1 Committer)
Issue
- CODETOOLS-7903740: JMH: Perf event validation not working with skid options (Bug - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jmh.git pull/132/head:pull/132
$ git checkout pull/132
Update a local copy of the PR:
$ git checkout pull/132
$ git pull https://git.openjdk.org/jmh.git pull/132/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 132
View PR using the GUI difftool:
$ git pr show -t 132
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jmh/pull/132.diff
Webrev
:wave: Welcome back galder! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@galderz This change now passes all automated pre-integration checks.
After integration, the commit message for the final commit will be:
7903740: JMH: Perf event validation not working with skid options
Reviewed-by: shade
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 1 new commit pushed to the master branch:
- c2931c90fd4079271b69a8a393dfe277634f3e52: 7903722: JMH: Add xctrace-based perfnorm profiler for macOS
Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).
Webrevs
- 07: Full - Incremental (f5691001)
- 06: Full - Incremental (d04c3a87)
- 05: Full - Incremental (069447e1)
- 04: Full - Incremental (bed0dbb2)
- 03: Full - Incremental (7f67829d)
- 02: Full - Incremental (2a938066)
- 01: Full - Incremental (8e3839a7)
- 00: Full (a9137910)
Mailing list message from Ian Rogers on jmh-dev:
An alternative to cycles:p is to use cycles:P (note the case) which means use the maximum precision level that is supported. It will give more accurate sampling than cycles:p when possible and works with perf stat: ``` $ sudo perf stat -e cycles:P true
Performance counter stats for 'true':
900,546 cycles:P
0.001261127 seconds time elapsed
0.001401000 seconds user 0.000000000 seconds sys ```
Thanks, Ian
On Fri, May 31, 2024 at 10:53?AM Galder Zamarre?o <galder at openjdk.org> wrote:
@galderz This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@shipilev Makes sense, I'll look into validating that.
@shipilev Updated the PR with your proposed fix.
There are GHA test failures, not sure if they are due to this fix, please check?
Sure, checking...
Hmmm, I think the error is caused because profiler supported check now passes when running in CI, whereas in the past this tests didn't attempt to run.
Although you cannot see the Profiler is not supported or cannot be enabled, skipping test message in successful JMH CI runs (e.g. here), you can see that the test runs in 0.218 sec and so it must be passing because it sees the profiler is not supported.
With my change, it seems now the test thinks the profiler is supported, but then no secondary results are obtained.
I'm unsure how to deal with this yet.
@shipilev Are you sure these issues are also present in the main branch? I don't see any main branch CI runs and I don't see any recent CI jobs other than the one here.
I wonder if something has changed in the way CI environments run with regards to perf. In the May run I pointed above linux perf asm tests where not really executing. That means that perf stat --event cycles --log-fd 2 echo 1 was outright failing or it was returning <not supported>, but that seems to work fine now, see this test I've just run today.
It would be useful to have a way to see if main branch is really affected by the CI failures here. I can't see how it's not.
Actually my test shows the issue with switching to perf record. With perf stat we validate whether that event is supported and my test shows that it prints:
2024-08-16T07:50:50.3850275Z Performance counter stats for 'echo 1':
2024-08-16T07:50:50.3850840Z
2024-08-16T07:50:50.3851431Z <not supported> cycles
2024-08-16T07:50:50.3852100Z
2024-08-16T07:50:50.3852392Z 0.002036823 seconds time elapsed
2024-08-16T07:50:50.3852825Z
2024-08-16T07:50:50.3853024Z 0.000000000 seconds user
2024-08-16T07:50:50.3853730Z 0.002091000 seconds sys
Since not supported is in the output, the profiler test does not execute.
The perf record output on the other hand does not include that output so it assumes it's fine:
2024-08-16T07:50:51.5384315Z ##[group]Run perf record --event cycles echo 1
2024-08-16T07:50:51.5384952Z [36;1mperf record --event cycles echo 1[0m
2024-08-16T07:50:51.5451005Z shell: /usr/bin/bash -e {0}
2024-08-16T07:50:51.5451385Z ##[endgroup]
2024-08-16T07:50:51.6709500Z WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
2024-08-16T07:50:51.6710972Z check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
2024-08-16T07:50:51.6711978Z
2024-08-16T07:50:51.6712829Z Samples in kernel functions may not be resolved if a suitable vmlinux
2024-08-16T07:50:51.6714013Z file is not found in the buildid cache or in the vmlinux path.
2024-08-16T07:50:51.6714727Z
2024-08-16T07:50:51.6715312Z Samples in kernel modules won't be resolved at all.
2024-08-16T07:50:51.6715889Z
2024-08-16T07:50:51.6724799Z PERFILE2�������@�����������������������������G�������������7�a������������������������������������������������������������������������������������������������������������������������P�����T��������@���fv-az1567-205���������������������������������������������������P�����T��������@���6.5.0-1025-azure������������������������������������������������P�����T��������@���6.5.13����������������������������������������������������������P�����T��������@���x86_64����������������������������������������������������������P�������������������P�����T��������@���AMD EPYC 7763 64-Core Processor���������������������������������P�����T� �������@���AuthenticAMD,25,1,1���������������������������������������������P������
2024-08-16T07:50:51.6727231Z If some relocation was applied (e.g. kexec) symbols may be misresolved
2024-08-16T07:50:51.6727979Z even with a suitable vmlinux or kallsyms file.
2024-08-16T07:50:51.6728384Z
2024-08-16T07:50:51.6730695Z �������0�������P����������������@���/usr/lib/linux-azure-6.5-tools-6.5.0-1025/perf������������������@���record����������������������������������������������������������@���--event���������������������������������������������������������@���cycles����������������������������������������������������������@���echo������������������������������������������������������������@���1���������������������������������������������������������������P�����������������������������������������G�������������7�a�����������������������������������������������������������������������������������������������@���cpu-clock�����������������������������������������������������������������������������������P�����\
2024-08-16T07:50:51.6733242Z ����������@���0-3����������������������������������������������������������������@���0-1�������������������������������������������������������������@���2-3����������������������������������������������������������������������������������������������@���0-3�����������������������������������������������������������������������������P�����l���������������0���������������@���0-3�������������������������������������������������������������P������������������@���cpu����������������������������������������������������������������@���software�������������������������������������������������������� ���@���uprobe�������������������������������������������������������������@���breakpoint���������������������������������������������������������@���tracepoint���������������������������������������������������������@���kprobe����������������������������������������������������������
2024-08-16T07:50:51.6734474Z Couldn't record kernel reference relocation symbol
2024-08-16T07:50:51.6735191Z Symbol resolution may be skewed if relocation was used (e.g. kexec).
2024-08-16T07:50:51.6735777Z Check /proc/kallsyms permission or run as root.
2024-08-16T07:50:51.8070024Z 1
2024-08-16T07:50:51.8071031Z [ perf record: Woken up 1 times to write data ]
2024-08-16T07:50:51.8073958Z ���@���msr�������������������������������������������������������������P����� ������������������������P�����X�����������������������������������������������������������������������������P�����������������P�����������������P�����������������P�����������������P������ �������O�����8��������l�W4����O"`{���������������������������E�������������������������������������c ���������������������������c ���������������������������c ���������������������������c ������N�����8���������������cpu-clock�����������������������I�����(��������c ����������������������J�����������������P���������������bpf_trampoline_6442559410������������������������������������8�c ��c ��perf-exec�������������������������������R���������� 0�c ��c ��echo����c ��c ���,��-����������
2024-08-16T07:50:51.8075631Z ����p�c ��c ����IV���@������� ������������,�������le�����������/usr/bin/echo���c ��c ������-����������
2024-08-16T07:50:51.8076240Z [ perf record: Captured and wrote 0.000 MB (null) ]
2024-08-16T07:50:51.8077114Z ������c ��c ���PY�`���������� ������������������oٮ�����������/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2��c ��c ��0 ��-����������
2024-08-16T07:50:51.8078299Z ����h�c ��c ���������� ��������������������������������������������[vdso]��c ��c ���F��-����������
2024-08-16T07:50:52.6748187Z ������c ��c ����"�`���P������������������������J��G����������/usr/lib/x86_64-linux-gnu/libc.so.6�����c ��c ������-���������� ����0�Y�F�����c ��c ��=��-����������������� ����0�ǖ������c ��c ������-����������������� ����0�ʻC�����c ��c ��ǚ��-����������������������8�c ��` ��c ��` ���t��-���c ��c ���r��-����������D������
Looking at the perf record output I don't see any specific signal to see if perf record works for that event or not.
I'm playing around with this idea to see if it works: do a perf record and if that doesn't fail, see with perf report --stdio if any events are in the perf.data file.
I think my idea works.
On my linux box, for cycles, I get:
$ perf record --event cycles echo 1
1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (7 samples) ]
$ perf report --stdio | grep 'cycles' | wc -l
1
For cycles:p I get:
❯ perf record --event cycles:p echo 1
1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (8 samples) ]
$ perf report --stdio | grep 'cycles:p' | wc -l
1
Then on CI (see test) I see:
$ perf record --event cycles echo 1
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB (null) ]
$ perf report --stdio | grep 'cycles' | wc -l
incompatible file format (rerun with -v to learn more)
0
Ok, let's see what CI thinks about this.
It checks if the perf record command shows that no data was recorded. This only works if we explicitly tell perf record to record to an output file, otherwise it seems no perf.data is generated regardless, and the command output always shows that no data was captured.
Hmmmm that doesn't fully work on CI :\ (see output here). Doing further investigations.
Ok, output of perf record is not good enough on CI, so decided to use perf report approach to find samples. Let's see if CI is happy with that. It works as expected on linux box.
@shipilev it run out of time waiting for a macos runner, any suggestions? Crucially the linux builds seem fine, which were failing before the latest fix.
/test
I see Mac runners are scarce for 11, and anyway there is little opportunity for us to debug it. So I would instead go and switch to latest runners here, and see if it helps: https://github.com/openjdk/jmh/pull/134
Mailing list message from Ian Rogers on jmh-dev:
On Thu, Sep 5, 2024 at 8:54?AM Galder Zamarre?o <galder at openjdk.org> wrote:
On Thu, 22 Aug 2024 12:54:10 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:
Well, I would have expected we parsed `perf record` error and error code directly?
% perf record -e blah echo 1 event syntax error: 'blah' ___ parser error Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>]
\-e\, \-\-event \<event> event selector\. use \'perf list\' to list available events% echo $? 129
% perf record -e cycles:p echo 1 Error: cycles:p: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
% echo $? 255
% perf record -e cycles echo 1 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.013 MB perf.data (16 samples) ]
% echo $? 0
Yes, I understand it actually does not test whether the counters _report_ non-zero values, but that's okay. We only do a light-weight validation that `perf record` even accepts these.
Failing hard when some events cannot be supported is also proper, I'd think. So there is no need to verify them one by one. Keep it as simple as possible?
I'd think so :-) Fwiw, ARM have added a cycles event to their L3 cache (in ARM language dsu) and so now "perf record -e cycles" fails on Ampere CPUs as L3 cache events can't sample. A proposed work around is to make it so that events don't fail.
The thing is that `perf record` does not always fail when it doesn't work. For example in container ([example](https://github.com/galderz/github-actions/actions/runs/10723692355/job/29737549503)):
$ perf record --event cycles --output perf-record-validate.data echo 1 WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted, check /proc/sys/kernel/kptr_restrict and /proc/sys/kernel/perf_event_paranoid.
There is special fallback for cycles to try to move it to a timer based (aka software) event cpu-clock or task-clock.
Samples in kernel functions may not be resolved if a suitable vmlinux file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved even with a suitable vmlinux or kallsyms file.
Couldn't record kernel reference relocation symbol Symbol resolution may be skewed if relocation was used (e.g. kexec). Check /proc/kallsyms permission or run as root. 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf-record-validate.data (3 samples) ]
$ echo $? 0
$ perf report -q -i perf-record-validate.data --stdio | grep cycles | wc -l 0
There's a --stats option for perf report, for example:
``` $ sudo perf record -e cycles:P -o - true | perf report -i - -- stats [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.058 MB - ]
Aggregated stats: TOTAL events: 459 MMAP events: 295 (64.3%) COMM events: 2 ( 0.4%) EXIT events: 1 ( 0.2%) SAMPLE events: 7 ( 1.5%) MMAP2 events: 4 ( 0.9%) KSYMBOL events: 86 (18.7%) BPF_EVENT events: 36 ( 7.8%) ATTR events: 1 ( 0.2%) FINISHED_ROUND events: 1 ( 0.2%) ID_INDEX events: 1 ( 0.2%) THREAD_MAP events: 1 ( 0.2%) CPU_MAP events: 1 ( 0.2%) EVENT_UPDATE events: 1 ( 0.2%) TIME_CONV events: 1 ( 0.2%) FEATURE events: 20 ( 4.4%) FINISHED_INIT events: 1 ( 0.2%) cycles:ppp stats: SAMPLE events: 7 ```
A lot of the issues I was having was because I assumed that in container `perf record` would fail but it doesn't, and later on you get errors saying that no events were recorded.
And in any case, if `perf record` fails, that's already handled by the current code when it calls `!failMsg.isEmpty()`.
Failing hard when some events cannot be supported is also proper, I'd think. So there is no need to verify them one by one. Keep it as simple as possible?
I'll try to craft a command that checks all the events in one go and if any is missing fail hard.
Perhaps directly using task-clock:u (ie restricting events to user land) can lessen the problems.
Thanks, Ian
Thanks for the comment Ian.
There's a --stats option for perf report, for example:
$ sudo perf record -e cycles:P -o - true | perf report -i - -- stats [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.058 MB - ] Aggregated stats: TOTAL events: 459 MMAP events: 295 (64.3%) COMM events: 2 ( 0.4%) EXIT events: 1 ( 0.2%) SAMPLE events: 7 ( 1.5%) MMAP2 events: 4 ( 0.9%) KSYMBOL events: 86 (18.7%) BPF_EVENT events: 36 ( 7.8%) ATTR events: 1 ( 0.2%) FINISHED_ROUND events: 1 ( 0.2%) ID_INDEX events: 1 ( 0.2%) THREAD_MAP events: 1 ( 0.2%) CPU_MAP events: 1 ( 0.2%) EVENT_UPDATE events: 1 ( 0.2%) TIME_CONV events: 1 ( 0.2%) FEATURE events: 20 ( 4.4%) FINISHED_INIT events: 1 ( 0.2%) cycles:ppp stats: SAMPLE events: 7 ```
That looks very promising! Just tried on container and seems to do what I'm looking for. I'll amend the PR.
I'll also ~rebase~ merge latest head so that the new runners kick in.
Ian, I've pushed changes based on your feedback. I think that does the job very nicely.
Wooo hoo, all tests passing, @shipilev can you have another look?
Well, I just went to my m7g.16xlarge instance, and:
% java -jar benchmarks.jar -prof perfasm -f 1 JMHSample_35_Profilers.Atomic
Profilers failed to initialize, exiting.
Perf event count not a number for one of events [cycles]: [Can't parse sample, err = -14
0x224c [0x30]: failed to process type: 9
Error:
failed to process sample
0
]
It works without the patch. I think checking code is being too smart for its own good? Reiterating my previous suggestion: let's not try to parse the recordings, and instead "just" parse the output perf record -e incorrect-event-blah for testing the events.
That is to say, I am willing to accept the simple patch like:
diff --git a/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java b/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java
index c948532f..69d74a86 100644
--- a/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java
+++ b/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java
@@ -46,7 +46,7 @@ public class LinuxPerfAsmProfiler extends AbstractPerfAsmProfiler {
public LinuxPerfAsmProfiler(String initLine) throws ProfilerException {
super(initLine, "cycles");
- String[] senseCmd = { PerfSupport.PERF_EXEC, "stat", "--event", Utils.join(requestedEventNames, ","), "--log-fd", "2", "echo", "1" };
+ String[] senseCmd = { PerfSupport.PERF_EXEC, "record", "-q", "--event", Utils.join(requestedEventNames, ","), "-o", "-", "echo", "1"};
Collection<String> failMsg = Utils.tryWith(senseCmd);
if (!failMsg.isEmpty()) {
...and everything else needs a very hard justification and a lot of testing to make sure it does not break in the cases it is supposed to work. The fact the PR code fails on my first actual try does not inspire confidence, to be honest.
Remember: the capabilities check in constructor is an opportunistic check, it can pass by accident, and users would discover their profiler does not really work later when looking at the results. The check should NOT fail by accident, when the profiler would actually work, if not for a misbehaving capability check.
Remember: the capabilities check in constructor is an opportunistic check, it can pass by accident, and users would discover their profiler does not really work later when looking at the results. The check should NOT fail by accident, when the profiler would actually work, if not for a misbehaving capability check.
Makes sense and I agree with you, but my previous efforts on this PR show that the CI is getting in the way of what you said above. E.g. see what happened in https://github.com/galderz/jmh/actions/runs/10384414105/job/28751371192, where the check passes by accident but then the profiler contains no data, so the test fails.
Closed the PR by mistake.
That is to say, I am willing to accept the simple patch like:
diff --git a/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java b/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java index c948532f..69d74a86 100644 --- a/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java +++ b/jmh-core/src/main/java/org/openjdk/jmh/profile/LinuxPerfAsmProfiler.java @@ -46,7 +46,7 @@ public class LinuxPerfAsmProfiler extends AbstractPerfAsmProfiler { public LinuxPerfAsmProfiler(String initLine) throws ProfilerException { super(initLine, "cycles"); - String[] senseCmd = { PerfSupport.PERF_EXEC, "stat", "--event", Utils.join(requestedEventNames, ","), "--log-fd", "2", "echo", "1" }; + String[] senseCmd = { PerfSupport.PERF_EXEC, "record", "-q", "--event", Utils.join(requestedEventNames, ","), "-o", "-", "echo", "1"}; Collection<String> failMsg = Utils.tryWith(senseCmd); if (!failMsg.isEmpty()) {...and everything else needs a very hard justification and a lot of testing to make sure it does not break in the cases it is supposed to work. The fact the PR code fails on my first actual try does not inspire confidence, to be honest.
Remember: the capabilities check in constructor is an opportunistic check, it can pass by accident, and users would discover their profiler does not really work later when looking at the results. The check should NOT fail by accident, when the profiler would actually work, if not for a misbehaving capability check.
I will give your suggestion a go and see if LinuxPerfAsmProfilerTest passes on CI with that fix.