dd-trace-java
dd-trace-java copied to clipboard
dd-trace-java v1.31.2 crashes the JVM
Hi DD team,
We've experienced JVM crashes due to segmentation fault with references to dd profiling. It's a containerized application running versions.
Tracer version: v1.31.2 Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 OS: Alpine Linux v3.19 Datadog agent: 7.42.1
The JVM have mostly crashed within the first two minutes after start, but has on more rare occasions crashed up to 6 hours after application start.
I will ensure to bring versions up to speed, and meanwhile we'll disable the profiling with DD_PROFILING_ENABLED=false.
Attached are two log entries.
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f82be65e43b, pid=1, tid=31
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.2+13 (21.0.2+13) (build 21.0.2+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (21.0.2+13-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [ld-musl-x86_64.so.1+0x5943b] strlen+0xd
#
# Core dump will be written. Default location: //core.1
#
# The JFR repository may contain useful JFR files. Location: /tmp/2024_04_10_09_23_20_1
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- S U M M A R Y ------------
Command Line: -javaagent:./agents/dd-java-agent.jar -XX:FlightRecorderOptions=stackdepth=256 -Dcom.sun.org.apache.xml.internal.security.ignoreLineBreaks=true -Dcom.ibm.mq.cfg.useIBMCipherMappings=false /app.jar
Host: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz, 2 cores, 3G, Alpine Linux v3.19
Time: Wed Apr 10 09:24:21 2024 CEST elapsed time: 127.619058 seconds (0d 0h 2m 7s)
--------------- T H R E A D ---------------
Current thread (0x00007f825afce880): JavaThread “dd-profiler-recording-scheduler” daemon [_thread_in_native, id=31, stack(0x00007f825ad2f000,0x00007f825ae2fa78) (1026K)]
Stack: [0x00007f825ad2f000,0x00007f825ae2fa78], sp=0x00007f825ae2eb48, free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [ld-musl-x86_64.so.1+0x5943b] strlen+0xd
C [libjavaProfiler2790011097090240288.so+0x18737] Recording::writeSettings(Buffer*, Arguments&)+0xd7
C [libjavaProfiler2790011097090240288.so+0x25b09] Recording::switchChunk(int)+0xf9
C [libjavaProfiler2790011097090240288.so+0x25bef] FlightRecorder::dump(char const*, int)+0x6f
C [libjavaProfiler2790011097090240288.so+0x3cfd1] Profiler::dump(char const*, int)+0x201
C [libjavaProfiler2790011097090240288.so+0x48f37] Java_com_datadoghq_profiler_JavaProfiler_dump0+0x47
Event: 11.935 Thread 0x00007f8174166820 Exception <a ‘java/lang/IncompatibleClassChangeError’{0x00000000c71b2228}: Found class java.lang.Object, but interface was expected> (0x00000000c71b2228)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 840]
Event: 11.978 Thread 0x00007f81131017c0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c944ba80}: ‘long java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.lang.Object, java.lang.Object, java.lang.Object)’> (0x00000000c944ba80)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 12.024 Thread 0x00007f81131017c0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c91ecb48}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, int)’> (0x00000000c91ecb48)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 12.025 Thread 0x00007f81131017c0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c9083eb0}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, int)’> (0x00000000c9083eb0)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.664 Thread 0x00007f810f692790 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c7c99770}: ‘double java.lang.invoke.DelegatingMethodHandle$Holder.reinvoke_L(java.lang.Object, java.lang.Object)’> (0x00000000c7c99770)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.666 Thread 0x00007f810f692790 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c7cbe398}: ‘double java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object)’> (0x00000000c7cbe398)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.851 Thread 0x00007f81130310a0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c79687e0}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.lang.Object, double, java.lang.Object)’> (0x00000000c79687e0)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.852 Thread 0x00007f81130310a0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c796e298}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, double)’> (0x00000000c796e298)
[11:22 AM] ==============
dd-profiler-recording-scheduler
[11:23 AM] #
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f817573343b, pid=1, tid=32
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.2+13 (21.0.2+13) (build 21.0.2+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (21.0.2+13-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [ld-musl-x86_64.so.1+0x5943b] strlen+0xd
#
# Core dump will be written. Default location: //core.1
#
# The JFR repository may contain useful JFR files. Location: /tmp/2024_04_10_10_37_03_1
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- S U M M A R Y ------------
Command Line: -javaagent:./agents/dd-java-agent.jar -XX:FlightRecorderOptions=stackdepth=256 -Dcom.sun.org.apache.xml.internal.security.ignoreLineBreaks=true -Dcom.ibm.mq.cfg.useIBMCipherMappings=false /app.jar
Host: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz, 2 cores, 3G, Alpine Linux v3.19
Time: Wed Apr 10 10:38:04 2024 CEST elapsed time: 118.902752 seconds (0d 0h 1m 58s)
--------------- T H R E A D ---------------
Current thread (0x00007f8113100040): JavaThread “dd-profiler-recording-scheduler” daemon [_thread_in_native, id=32, stack(0x00007f811226a000,0x00007f811236aa78) (1026K)]
Stack: [0x00007f811226a000,0x00007f811236aa78], sp=0x00007f8112369af8, free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [ld-musl-x86_64.so.1+0x5943b] strlen+0xd
C [libjavaProfiler1637923034457328543.so+0x18737] Recording::writeSettings(Buffer*, Arguments&)+0xd7
C [libjavaProfiler1637923034457328543.so+0x25b09] Recording::switchChunk(int)+0xf9
C [libjavaProfiler1637923034457328543.so+0x25bef] FlightRecorder::dump(char const*, int)+0x6f
C [libjavaProfiler1637923034457328543.so+0x3cfd1] Profiler::dump(char const*, int)+0x201
C [libjavaProfiler1637923034457328543.so+0x48f37] Java_com_datadoghq_profiler_JavaProfiler_dump0+0x47
j com.datadoghq.profiler.JavaProfiler.dump0(Ljava/lang/String;)V+0
j com.datadoghq.profiler.JavaProfiler.dump(Ljava/nio/file/Path;)V+11
j com.datadog.profiling.ddprof.DatadogProfiler.dump(Ljava/nio/file/Path;)V+5
j com.datadog.profiling.ddprof.DatadogProfilerRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData;+17
j com.datadog.profiling.controller.ddprof.DatadogProfilerOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData;+10
j com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.lambda$snapshot$0(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;Lcom/datadog/profiling/controller/OngoingRecording;)Ldatadog/trace/api/profiling/RecordingData;+3
j com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording$$Lambda+0x00007f811b937ac8.apply(Ljava/lang/Object;)Ljava/lang/Object;+12
J 2779 c1 java.util.stream.ReferencePipeline$3$1.accept(Ljava/lang/Object;)V [email protected] (23 bytes) @ 0x00007f815f48eedc [0x00007f815f48edc0+0x000000000000011c]
J 14957 c1 java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Ljava/util/function/Consumer;)V [email protected] (127 bytes) @ 0x00007f815f2c768c [0x00007f815f2c7400+0x000000000000028c]
J 14388 c2 java.util.stream.ReferencePipeline.collect(Ljava/util/stream/Collector;)Ljava/lang/Object; [email protected] (124 bytes) @ 0x00007f81672162bc [0x00007f8167215fe0+0x00000000000002dc]
j com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.compose(Ljava/util/function/Function;)Ldatadog/trace/api/profiling/RecordingData;+22
j com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData;+8
j com.datadog.profiling.controller.ProfilingSystem$SnapshotRecording.snapshot(Z)V+38
Hi Henrik, my apologies for inconvenience.
First, you don't need to disable profiling completely - DD_PROFILING_DDPROF_ENABLED=false will disable the profiling library where the crash occurs, still keeping the JFR profiling and most of the profiling functionality active.
I wonder if you could attach the full hs_err.log file here. The code is heavily inlined and having all information from the crash log would help us to diagnose the root cause.
Thanks for understanding.
Hi Henrik, my apologies for inconvenience.
First, you don't need to disable profiling completely -
DD_PROFILING_DDPROF_ENABLED=falsewill disable the profiling library where the crash occurs, still keeping the JFR profiling and most of the profiling functionality active.I wonder if you could attach the full hs_err.log file here. The code is heavily inlined and having all information from the crash log would help us to diagnose the root cause.
Thanks for understanding.
Hi @jbachorik ,
Thanks for the quick reply! I have a log file ready on this error, however I would prefer a closed communication channel given I can't affirm the information it contains . Do you have any alternative ways I can share the file?
HI @henrik-sandberg - feel free to open a support ticket and then give me the ticket number. I should be able to follow up.
Thank you @jbachorik - I have created #1645495. Please let me know if I can assist further in any way.