dd-trace-java
dd-trace-java copied to clipboard
Core dump when enabling dd-agent and profiling (openj9)
Good morning,
After adding the Datadog agent to a spring-boot service we started to see core dumps in this service.
These are de options printed by the agent when it starts:
{"version":"0.102.0~b67f6e3380","os_name":"Linux","os_version":"3.10.0-1160.21.1.el7.x86_64","architecture":"amd64","lang":"jvm","lang_version":"11.0.15","jvm_vendor":"IBM Corporation","jvm_version":"openj9-0.32.0","java_class_version":"55.0","http_nonProxyHosts":"null","http_proxyHost":"null","enabled":true,"service":"my-servivce","agent_url":"http://10.145.0.84:8126","agent_error":false,"debug":false,"analytics_enabled":false,"sampling_rules":[{},{}],"priority_sampling_enabled":true,"logs_correlation_enabled":true,"profiling_enabled":true,"appsec_enabled":false,"dd_version":"0.102.0~b67f6e3380","health_checks_enabled":true,"configuration_file":"no config file present","runtime_id":"2b10b388-8f66-4d56-97cd-14a0e4541338","logging_settings":{"levelInBrackets":false,"dateTimeFormat":"'[dd.trace 'yyyy-MM-dd HH:mm:ss:SSS Z']'","logFile":"System.err","configurationFile":"simplelogger.properties","showShortLogName":false,"showDateTime":true,"showLogName":true,"showThreadName":true,"defaultLogLevel":"INFO","warnLevelString":"WARN","embedException":false},"cws_enabled":false,"cws_tls_refresh":5000}
The stderr when the crash happens is:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000080
Handler1=00007F7CB804EA20 Handler2=00007F7CB3D9C0E0 InaccessibleAddress=0000000000000000
RDI=00000000027B2910 RSI=0000000000000004 RAX=0080000000090300 RBX=00000000027B2910
RCX=0000000000400000 RDX=00007F7C9B72B130 R8=0000000000000000 R9=00000000027B9240
R10=00000000027B9208 R11=000000001C0C0100 R12=0000000000000000 R13=000000001C0C0100
R14=0000000000000000 R15=0000000000000005
RIP=00007F7CB29FD2E2 GS=0000 FS=0000 RSP=00007F7C9A53B120
EFlags=0000000000010202 CS=0033 RBP=00000000027B9278 ERR=0000000000000000
TRAPNO=000000000000000D OLDMASK=0000000000000000 CR2=0000000000000000
xmm0 0000003000000020 (f: 32.000000, d: 1.018558e-312)
xmm1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm2 00007f7c9a53b5b0 (f: 2589177344.000000, d: 6.925473e-310)
xmm3 823f97e2cb4f6ac6 (f: 3410979584.000000, d: -7.548130e-298)
xmm4 38d93649e356e643 (f: 3814123008.000000, d: 7.586980e-35)
xmm5 3a000000c2000000 (f: 3254779904.000000, d: 2.524357e-29)
xmm6 269004cd5c0e6ac0 (f: 1544448768.000000, d: 6.058019e-123)
xmm7 00001969ed58d969 (f: 3982022912.000000, d: 1.380555e-310)
xmm8 0b0a090803020100 (f: 50462976.000000, d: 1.733947e-255)
xmm9 ffffffffffffffff (f: 4294967296.000000, d: -nan)
xmm10 7e0ebb6f592ae470 (f: 1495983232.000000, d: 1.607900e+299)
xmm11 09836ec563cd36ff (f: 1674393344.000000, d: 7.714130e-263)
xmm12 0000000000000001 (f: 1.000000, d: 4.940656e-324)
xmm13 08090a0b0c0d0e0f (f: 202182160.000000, d: 5.924543e-270)
xmm14 0000000000c167ee (f: 12675054.000000, d: 6.262309e-317)
xmm15 45c296f89cbe2ce8 (f: 2629709056.000000, d: 1.150649e+28)
Module=/opt/java/openjdk/lib/default/libj9jit29.so
Module_base_address=00007F7CB20AF000
Target=2_90_20220422_425 (Linux 3.10.0-1160.21.1.el7.x86_64)
CPU=amd64 (4 logical CPUs) (0x2e3aca000 RAM)
----------- Stack Backtrace -----------
jitWalkStackFrames+0x1482 (0x00007F7CB29FD2E2 [libj9jit29.so+0x94e2e2])
walkStackFrames+0xb3 (0x00007F7CB808E053 [libj9vm29.so+0x7f053])
_ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x5594 (0x00007F7CB80A62C4 [libj9vm29.so+0x972c4])
bytecodeLoopCompressed+0x95 (0x00007F7CB80A0D25 [libj9vm29.so+0x91d25])
(0x00007F7CB814B942 [libj9vm29.so+0x13c942])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/06/28 07:10:00 - please wait.
JVMDUMP039I Processing dump event "abort", detail "" at 2022/06/28 07:10:00 - please wait.
[dd.trace 2022-06-28 08:39:13:431 +0000] [OkHttp http://10.145.0.84:8126/...] WARN com.datadog.profiling.uploader.ProfileUploader - Failed to upload profile, received empty reply from http://10.145.0.84:8126/profiling/v1/input after uploading profile (Will not log errors for 5 minutes)
[dd.trace 2022-06-28 08:40:09:155 +0000] [OkHttp http://10.145.0.84:8126/...] INFO com.datadog.profiling.uploader.ProfileUploader - Upload done
I dont have the full core dump because it's a dockerized environment and the storage is volatile.
Before the crash happened we observe this exception in the stdout:
--2022-06-27 11:04:16.305 WARN [my-service,,] 8 --- [CP Accept-26934] [, , , , , ] sun.rmi.transport.tcp : RMI TCP Accept-26934: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=26934] throws
-
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.base/java.net.ServerSocket.implAccept(Unknown Source)
at java.base/java.net.ServerSocket.accept(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
-2022-06-27 11:04:16.307 WARN [my-service,,] 8 --- [MI TCP Accept-0] [, , , , , ] sun.rmi.transport.tcp : RMI TCP Accept-0: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=45495] throws
-
java.net.SocketTimeoutException: Accept timed out
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.base/java.net.ServerSocket.implAccept(Unknown Source)
at java.base/java.net.ServerSocket.accept(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Could you help me to address this?
Hi @monwolf, this crash is happening inside the J9 JVM while it's walking the Java stack which indicates a JVM bug.
Have you tried upgrading the J9 JVM to the latest patch level to see if it fixes the issue?
You could also try turning off the profiler with -Ddd.profiling.enabled=false
to see if this avoids triggering the JVM bug.
Hi,
We tried without the profiling and it worked.
We are using the image ibm-semeru-runtimes:open-11-jre that should be the latest for J9 - JRE11 (jdk-11.0.15+10_openj9-0.32.0), isn't it?
yes, that appears to be the latest release of OpenJ9 - I've asked our profiler team if they've seen this before
meanwhile I'd recommend you open a ticket with the OpenJ9 team - if you send them the details you posted above they can tell you more about the crash and whether it's a known issue with the JIT compiler (the crash happened in libj9jit29.so
according to the report)
Let's try luck there too.
@monwolf please try using the latest version of the tracer v1.24.2. If this issue still persists, please open up a support ticket at https://www.datadoghq.com/support/