dd-trace-java icon indicating copy to clipboard operation
dd-trace-java copied to clipboard

dd-trace-java v1.31.2 crashes the JVM

Open henrik-sandberg opened this issue 1 year ago • 4 comments

Hi DD team,

We've experienced JVM crashes due to segmentation fault with references to dd profiling. It's a containerized application running versions.

Tracer version: v1.31.2 Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 OS: Alpine Linux v3.19 Datadog agent: 7.42.1

The JVM have mostly crashed within the first two minutes after start, but has on more rare occasions crashed up to 6 hours after application start.

I will ensure to bring versions up to speed, and meanwhile we'll disable the profiling with DD_PROFILING_ENABLED=false.

Attached are two log entries.

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f82be65e43b, pid=1, tid=31
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.2+13 (21.0.2+13) (build 21.0.2+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (21.0.2+13-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [ld-musl-x86_64.so.1+0x5943b]  strlen+0xd
#
# Core dump will be written. Default location: //core.1
#
# The JFR repository may contain useful JFR files. Location: /tmp/2024_04_10_09_23_20_1
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
---------------  S U M M A R Y ------------
Command Line: -javaagent:./agents/dd-java-agent.jar -XX:FlightRecorderOptions=stackdepth=256 -Dcom.sun.org.apache.xml.internal.security.ignoreLineBreaks=true -Dcom.ibm.mq.cfg.useIBMCipherMappings=false /app.jar
Host: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz, 2 cores, 3G, Alpine Linux v3.19
Time: Wed Apr 10 09:24:21 2024 CEST elapsed time: 127.619058 seconds (0d 0h 2m 7s)
---------------  T H R E A D  ---------------
Current thread (0x00007f825afce880):  JavaThread “dd-profiler-recording-scheduler” daemon [_thread_in_native, id=31, stack(0x00007f825ad2f000,0x00007f825ae2fa78) (1026K)]
Stack: [0x00007f825ad2f000,0x00007f825ae2fa78],  sp=0x00007f825ae2eb48,  free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [ld-musl-x86_64.so.1+0x5943b]  strlen+0xd
C  [libjavaProfiler2790011097090240288.so+0x18737]  Recording::writeSettings(Buffer*, Arguments&)+0xd7
C  [libjavaProfiler2790011097090240288.so+0x25b09]  Recording::switchChunk(int)+0xf9
C  [libjavaProfiler2790011097090240288.so+0x25bef]  FlightRecorder::dump(char const*, int)+0x6f
C  [libjavaProfiler2790011097090240288.so+0x3cfd1]  Profiler::dump(char const*, int)+0x201
C  [libjavaProfiler2790011097090240288.so+0x48f37]  Java_com_datadoghq_profiler_JavaProfiler_dump0+0x47
Event: 11.935 Thread 0x00007f8174166820 Exception <a ‘java/lang/IncompatibleClassChangeError’{0x00000000c71b2228}: Found class java.lang.Object, but interface was expected> (0x00000000c71b2228)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 840]
Event: 11.978 Thread 0x00007f81131017c0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c944ba80}: ‘long java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.lang.Object, java.lang.Object, java.lang.Object)’> (0x00000000c944ba80)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 12.024 Thread 0x00007f81131017c0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c91ecb48}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, int)’> (0x00000000c91ecb48)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 12.025 Thread 0x00007f81131017c0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c9083eb0}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object, int)’> (0x00000000c9083eb0)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.664 Thread 0x00007f810f692790 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c7c99770}: ‘double java.lang.invoke.DelegatingMethodHandle$Holder.reinvoke_L(java.lang.Object, java.lang.Object)’> (0x00000000c7c99770)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.666 Thread 0x00007f810f692790 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c7cbe398}: ‘double java.lang.invoke.DirectMethodHandle$Holder.invokeSpecial(java.lang.Object, java.lang.Object)’> (0x00000000c7cbe398)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.851 Thread 0x00007f81130310a0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c79687e0}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.lang.Object, double, java.lang.Object)’> (0x00000000c79687e0)
thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 772]
Event: 14.852 Thread 0x00007f81130310a0 Exception <a ‘java/lang/NoSuchMethodError’{0x00000000c796e298}: ‘java.lang.Object java.lang.invoke.DirectMethodHandle$Holder.newInvokeSpecial(java.lang.Object, double)’> (0x00000000c796e298)
[11:22 AM] ==============
dd-profiler-recording-scheduler
[11:23 AM] #
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f817573343b, pid=1, tid=32
#
# JRE version: OpenJDK Runtime Environment Temurin-21.0.2+13 (21.0.2+13) (build 21.0.2+13-LTS)
# Java VM: OpenJDK 64-Bit Server VM Temurin-21.0.2+13 (21.0.2+13-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [ld-musl-x86_64.so.1+0x5943b]  strlen+0xd
#
# Core dump will be written. Default location: //core.1
#
# The JFR repository may contain useful JFR files. Location: /tmp/2024_04_10_10_37_03_1
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
---------------  S U M M A R Y ------------
Command Line: -javaagent:./agents/dd-java-agent.jar -XX:FlightRecorderOptions=stackdepth=256 -Dcom.sun.org.apache.xml.internal.security.ignoreLineBreaks=true -Dcom.ibm.mq.cfg.useIBMCipherMappings=false /app.jar
Host: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz, 2 cores, 3G, Alpine Linux v3.19
Time: Wed Apr 10 10:38:04 2024 CEST elapsed time: 118.902752 seconds (0d 0h 1m 58s)
---------------  T H R E A D  ---------------
Current thread (0x00007f8113100040):  JavaThread “dd-profiler-recording-scheduler” daemon [_thread_in_native, id=32, stack(0x00007f811226a000,0x00007f811236aa78) (1026K)]
Stack: [0x00007f811226a000,0x00007f811236aa78],  sp=0x00007f8112369af8,  free space=1022k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [ld-musl-x86_64.so.1+0x5943b]  strlen+0xd
C  [libjavaProfiler1637923034457328543.so+0x18737]  Recording::writeSettings(Buffer*, Arguments&)+0xd7
C  [libjavaProfiler1637923034457328543.so+0x25b09]  Recording::switchChunk(int)+0xf9
C  [libjavaProfiler1637923034457328543.so+0x25bef]  FlightRecorder::dump(char const*, int)+0x6f
C  [libjavaProfiler1637923034457328543.so+0x3cfd1]  Profiler::dump(char const*, int)+0x201
C  [libjavaProfiler1637923034457328543.so+0x48f37]  Java_com_datadoghq_profiler_JavaProfiler_dump0+0x47
j  com.datadoghq.profiler.JavaProfiler.dump0(Ljava/lang/String;)V+0
j  com.datadoghq.profiler.JavaProfiler.dump(Ljava/nio/file/Path;)V+11
j  com.datadog.profiling.ddprof.DatadogProfiler.dump(Ljava/nio/file/Path;)V+5
j  com.datadog.profiling.ddprof.DatadogProfilerRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData;+17
j  com.datadog.profiling.controller.ddprof.DatadogProfilerOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData;+10
j  com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.lambda$snapshot$0(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;Lcom/datadog/profiling/controller/OngoingRecording;)Ldatadog/trace/api/profiling/RecordingData;+3
j  com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording$$Lambda+0x00007f811b937ac8.apply(Ljava/lang/Object;)Ljava/lang/Object;+12
J 2779 c1 java.util.stream.ReferencePipeline$3$1.accept(Ljava/lang/Object;)V [email protected] (23 bytes) @ 0x00007f815f48eedc [0x00007f815f48edc0+0x000000000000011c]
J 14957 c1 java.util.ArrayList$ArrayListSpliterator.forEachRemaining(Ljava/util/function/Consumer;)V [email protected] (127 bytes) @ 0x00007f815f2c768c [0x00007f815f2c7400+0x000000000000028c]
J 14388 c2 java.util.stream.ReferencePipeline.collect(Ljava/util/stream/Collector;)Ljava/lang/Object; [email protected] (124 bytes) @ 0x00007f81672162bc [0x00007f8167215fe0+0x00000000000002dc]
j  com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.compose(Ljava/util/function/Function;)Ldatadog/trace/api/profiling/RecordingData;+22
j  com.datadog.profiling.agent.CompositeController$CompositeOngoingRecording.snapshot(Ljava/time/Instant;Ldatadog/trace/api/profiling/ProfilingSnapshot$Kind;)Ldatadog/trace/api/profiling/RecordingData;+8
j  com.datadog.profiling.controller.ProfilingSystem$SnapshotRecording.snapshot(Z)V+38

henrik-sandberg avatar Apr 11 '24 09:04 henrik-sandberg

Hi Henrik, my apologies for inconvenience.

First, you don't need to disable profiling completely - DD_PROFILING_DDPROF_ENABLED=false will disable the profiling library where the crash occurs, still keeping the JFR profiling and most of the profiling functionality active.

I wonder if you could attach the full hs_err.log file here. The code is heavily inlined and having all information from the crash log would help us to diagnose the root cause.

Thanks for understanding.

jbachorik avatar Apr 11 '24 09:04 jbachorik

Hi Henrik, my apologies for inconvenience.

First, you don't need to disable profiling completely - DD_PROFILING_DDPROF_ENABLED=false will disable the profiling library where the crash occurs, still keeping the JFR profiling and most of the profiling functionality active.

I wonder if you could attach the full hs_err.log file here. The code is heavily inlined and having all information from the crash log would help us to diagnose the root cause.

Thanks for understanding.

Hi @jbachorik ,

Thanks for the quick reply! I have a log file ready on this error, however I would prefer a closed communication channel given I can't affirm the information it contains . Do you have any alternative ways I can share the file?

henrik-sandberg avatar Apr 11 '24 13:04 henrik-sandberg

HI @henrik-sandberg - feel free to open a support ticket and then give me the ticket number. I should be able to follow up.

jbachorik avatar Apr 11 '24 13:04 jbachorik

Thank you @jbachorik - I have created #1645495. Please let me know if I can assist further in any way.

henrik-sandberg avatar Apr 12 '24 06:04 henrik-sandberg