pixie icon indicating copy to clipboard operation
pixie copied to clipboard

Flame Graph shows hexadecimal addresses instead of method names for Java callstack

Open RomanAgeev opened this issue 2 years ago • 5 comments

Describe the bug

Hello,

The Pixie flame graph displays hexadecimal addresses instead of method names for a Java callstack. We use Pixie via its integration with the New Relic. We specify -XX:+PreserveFramePointer Java parameter.

The Java version:

openjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment Zulu11.62+17-CA (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.62+17-CA (build 11.0.18+10-LTS, mixed mode)

Node Kernel version:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy

Kubernetes version (EKS):

Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.13-eks-0a21954", GitCommit:"6305d65c340554ad8b4d7a5f21391c9fa34932cb", GitTreeState:"clean", BuildDate:"2023-04-15T00:33:45Z", GoVersion:"go1.19.8", Compiler:"gc", Platform:"linux/amd64"}

Expected behavior The method names are expected in the Java callstack.

Screenshots image

Thanks !

RomanAgeev avatar Jun 19 '23 09:06 RomanAgeev

Hi @RomanAgeev ,

If possible, can you send us the the logs (px collect-logs or kubectl -n pl logs vizier-pem-abcde) or write up a method by which we can reproduce the issue?

Thank you!

etep avatar Jun 21 '23 01:06 etep

Hi @etep,

You can find the vizier-pem logs here.

Also, here are the JVM parameters we use in the profiled process:

java -cp donkey.jar:sapjco3.jar com.fivetran.donkey.Main -Djava.security.egd=file:///dev/urandom -XX:+UnlockExperimentalVMOptions --illegal-access=debug

And, here is the JAVA_TOOL_OPTIONS envvar:

JAVA_TOOL_OPTIONS=-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/heapdumps/heapdump_f02bc6d6-82a5-4f12-a1f3-9d2f2084ea99.bin -XX:+PreserveFramePointer -Djava.util.logging.config.class=com.fivetran.donkey.log.FluentBitDonkeyLogConfig -Dcom.mchange.v2.log.MLog=jul -Xmx2048m -XX:+UseParallelGC -XX:GCTimeRatio=499 -XX:ParallelGCThreads=4 -agentpath:/opt/async-profiler/libasyncProfiler.so=start,event=cpu,file=/profiles/robberies_alter--oracle_hva--thirsty_mystify--f02bc6d6-82a5-4f12-a1f3-9d2f2084ea99.html

Hopefully, it can help.

RomanAgeev avatar Jun 21 '23 12:06 RomanAgeev

In the log files, I found the following:

>grep 'Could not open symbol file' *                                                                                           
vizier-pem-4l77d.log:W20230621 11:56:43.626794  1059 java_symbolizer.cc:348] Java attacher [pid=19871]: Could not open symbol file: /proc/19871/root/tmp/px-java-symbolization-artifacts-19871-282713056-c999093280d6953a38b541dbda251a09/java-symbols.bin.
vizier-pem-vhbtb.log:W20230621 10:30:22.246654 11788 java_symbolizer.cc:348] Java attacher [pid=4039]: Could not open symbol file: /proc/4039/root/tmp/px-java-symbolization-artifacts-4039-59455412-c999093280d6953a38b541dbda251a09/java-symbols.bin.
vizier-pem-vhbtb.log:W20230621 10:45:38.612197 11788 java_symbolizer.cc:348] Java attacher [pid=11755]: Could not open symbol file: /proc/11755/root/tmp/px-java-symbolization-artifacts-11755-59545816-c999093280d6953a38b541dbda251a09/java-symbols.bin.
vizier-pem-vhbtb.log:W20230621 11:34:19.920790 11788 java_symbolizer.cc:348] Java attacher [pid=21411]: Could not open symbol file: /proc/21411/root/tmp/px-java-symbolization-artifacts-21411-59839565-c999093280d6953a38b541dbda251a09/java-symbols.bin.
vizier-pem-vhbtb.log:W20230621 11:58:41.348549 11788 java_symbolizer.cc:348] Java attacher [pid=10419]: Could not open symbol file: /proc/10419/root/tmp/px-java-symboliz

By way of explanation, symbolizing Java stack traces presents a challenge. We wrote a blog post, here: https://blog.px.dev/cpu-profiling-java/ Succinctly, to get Java symbols we attach a JVMTI agent that responds to a callback every time the Java process compiles any code. In that callback, we get the symbol name and the virtual address. Those are written into a symbol file that is in the tmp space mounted in whatever container/pod is running the Java process.

It appears that the JVMTI symbolization agents are being attached with no problem (try grep 'AgentAttachApp finished' <log-files>).

To pursue the theory that the relevant symbolization files are not accessible could you please:

  1. Run an experiment where you record the PIDs of the processes you want to profile.
  2. Grab the logs.
  3. Check the logs for "Could not open symbol file" and see if the PIDs are in the interesting set from (1) above.

If this theory proves out, we need to figure out why the symbol file cannot be opened. I wonder if the symbol file was not even created (i.e. even though the agent attach process indicated success). Also, I wonder if the tmp volume is somehow not accessible from the pem pod, i.e. the agent itself (which is in the container namespace) created the file, but the pem in a different namespace cannot access it. Or maybe the tmp volume has some other property that is a problem here.

Also - could you please send a few notes about how we might reproduce the issue? I brought up an EKS cluster, deployed pixie and also our sock shop demo. Java symbols were shown correctly for this, but obviously my setup must have some divergence from yours. Screenshot 2023-06-21 at 5 06 17 PM

etep avatar Jun 22 '23 00:06 etep

The problem looks to be transient as I cannot reproduce it now. The Java pods we want to debug with the Flame Graph are short-lived (around several minutes or tenths of minutes). The problematic ones have already been stopped and the new ones seem to run without the problem.

I need more time to monitor the situation.

RomanAgeev avatar Jun 23 '23 13:06 RomanAgeev

Hi @RomanAgeev ,

Thank you for the update. We limit our agent attach rate to one process every 30 seconds. We did this to reduce any resource consumption (e.g CPU use) incurred by the attach process (i.e. spread that out over time so that peak consumption never goes too high based on that). While that policy may be on the conservative side, I do not think we are going to change things right now. If short lived pods continue to present a profiling need, here are some ideas to try: 1. add the symbolization agent onto the java command line, i.e. so that it is brought up immediately with the java process. 2. try a java native profiler to sidestep the symbolization problem (this may be a heavier lift to integrate with a k8s env).

etep avatar Jun 23 '23 17:06 etep