camel-quarkus icon indicating copy to clipboard operation
camel-quarkus copied to clipboard

perf-regression: evaluate the possibility to use async-profiler during runs

Open aldettinger opened this issue 2 years ago • 6 comments

Describe the feature here

In some cases, it could be relevant to use async-profiler during performance runs.

The installation of async-profile is documented here. The kernel configuration should be edited as follow:

sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'
sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'

From there, the regression sample app pom file could be completed with a java agent as below:

                        <groupId>org.codehaus.mojo</groupId>
                        <artifactId>exec-maven-plugin</artifactId>
                        <configuration>
                            <executable>java</executable>
                            <arguments>
                                <argument>-agentpath:/path-to-async-profiler/lib/libasyncProfiler.so=start,ann,threads,event=cpu,file=jvm_mode_report.html</argument>
                                <argument>-Dquarkus.http.port=${quarkus.http.port}</argument>
                                <argument>-jar</argument>
                                <argument>target/quarkus-app/quarkus-run.jar</argument>
                            </arguments>
                        </configuration>

A report.html file is generated in target/cq-versions-under-test/version.

Maybe it could be interesting to add a new tool option for this like -ap path-to-async-profiler. Then the pom generation logic should be changed, probably xslt would be a better fit than maven xpp3.

aldettinger avatar Oct 26 '23 09:10 aldettinger

I suggest using jfrSync too

franz1981 avatar Oct 26 '23 10:10 franz1981

I worked on something similar in recent days. I took @franz1981 's fabulous quarkus-profiling-workshop as a base. The biggest challenge for me was to overcome the unability of wrk/wrk2 to send post requests with a payload. I ended up with using hyperfoil cli and a regular hyperfoil yaml. With those two in place, I was able to attach the profiler only after the warmup was over. I unfortunately cannot share it here, because it contains some customer data. @aldettinger I see that our cq-perf-regression-scenario.hf.yaml does not have a warmup phase yet. I wonder how useful would the profiling data be without warmup? I wonder whether/how the warmup-profiling coordination could be made through Maven?

ppalaga avatar Oct 26 '23 12:10 ppalaga

Thanks @franz1981, --jfrsync good to know. And well done Peter for the test setup @ppalaga.

Yes, the warmup is an open question. As of now, we have caught 1 regression in Quarkus and 2 in Camels without warmup. So it does not feel that bad, probably because warmups are almost same when comparing multiple versions. There are multiple paths that can be taken, improving perf regression investigation, lowering duration, running on prs... Need a bit more thought.

aldettinger avatar Oct 26 '23 16:10 aldettinger

The problem with short duration is that you need to increase the overhead of the profiler to gather enough sample to be representative enough (eg 1K Hz) but it means increasing the overhead, on likely low powered machines. If we can assume that we care about macro effects on short running programs on CI (which means few cycles to spare) you can decide to stop at tiered 1 and just avoid the noisy attempts od C2 to compile the "hot" code, which won't likely be hot anyway, but still...if is an A/B test meant to run few times for short time, it's just important that the 2 runs run on similar conditions, and makes sense to reduce the noise due to a constrained environment, but requires some JDK eng to validate it.

franz1981 avatar Oct 26 '23 16:10 franz1981

I believe the easiest and safest approach is finding an adequate environment for running this. The best option, IMHO, is running these on dedicated machine (one you can control the full stack of it).

That's what I do for Camel Core. Even though it's not automated or integrated with the contribution lifecycle, the perf tests are open and folks can try to reproduce those if they want to. The downside is that it requires a rather constant observation of what's going on, so you catch these early.

If that's a road you seem feasible, there's some alternatives at our company and I can provide you some pointers to that.

orpiske avatar Oct 26 '23 17:10 orpiske

Not to forget, it would also be good if the chosen implementation could cover native mode too. It's lower priority, still interesting.

aldettinger avatar Nov 23 '23 15:11 aldettinger