apm-server
apm-server copied to clipboard
processor/otel: Java exceptions thrown by Spring Reactive are not parsed correctly
APM Server version (apm-server version): 7.15.2
Description of the problem including expected versus actual behavior:
Exceptions thrown in Spring Reactive application code, when captured by OpenTelemetry Java auto instrumentation, are not parsed correctly by the processor/otel.
Steps to reproduce:
See https://discuss.elastic.co/t/stack-trace-is-not-parsed/292179
Sample exception:
org.springframework.web.reactive.function.client.WebClientRequestException: Connection refused: no further information: localhost/127.0.0.1:8090; nested exception is io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:8090
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
Suppressed: The stacktrace has been enhanced by Reactor, refer to additional information below:
Error has been observed at the following site(s):
*__checkpoint ⇢ Request to GET null [DefaultWebClient]
Original Stack Trace:
at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:204)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:106)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onError(MonoIgnoreThen.java:270)
at reactor.core.publisher.Operators.error(Operators.java:198)
at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:236)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2400)
at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onSubscribe(MonoIgnoreThen.java:134)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onSubscribe$0(TracingSubscriber.java:53)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onSubscribe(TracingSubscriber.java:53)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onSubscribe(FluxPeek.java:171)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onSubscribe$0(TracingSubscriber.java:53)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onSubscribe(TracingSubscriber.java:53)
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
at reactor.core.publisher.MonoDeferContextual.subscribe(MonoDeferContextual.java:55)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:255)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:225)
at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:274)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerError(FluxConcatMap.java:309)
at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onError(FluxConcatMap.java:873)
at reactor.core.publisher.Operators.error(Operators.java:198)
at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:449)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:251)
at reactor.core.publisher.EmitterProcessor.drain(EmitterProcessor.java:491)
at reactor.core.publisher.EmitterProcessor.tryEmitNext(EmitterProcessor.java:299)
at reactor.core.publisher.SinkManySerialized.tryEmitNext(SinkManySerialized.java:100)
at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:27)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:190)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:194)
at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect$ClientTransportSubscriber.onError(HttpClientConnect.java:304)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:194)
at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onError(DefaultPooledConnectionProvider.java:172)
at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.fail(AbstractPool.java:477)
at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.lambda$drainLoop$9(SimpleDequePool.java:401)
at reactor.core.publisher.FluxDoOnEach$DoOnEachSubscriber.onError(FluxDoOnEach.java:186)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:194)
at reactor.netty.resources.DefaultPooledConnectionProvider$PooledConnectionAllocator$PooledConnectionInitializer.onError(DefaultPooledConnectionProvider.java:554)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:106)
at reactor.core.publisher.Operators.error(Operators.java:198)
at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onError(FluxPeekFuseable.java:234)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:142)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:258)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:142)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:258)
at reactor.netty.transport.TransportConnector$MonoChannelPromise.tryFailure(TransportConnector.java:517)
at reactor.netty.transport.TransportConnector$MonoChannelPromise.setFailure(TransportConnector.java:471)
at reactor.netty.transport.TransportConnector.lambda$doConnect$7(TransportConnector.java:206)
at io.opentelemetry.javaagent.instrumentation.netty.common.FutureListenerWrappers$WrappedFutureListener.operationComplete(FutureListenerWrappers.java:106)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99)
at reactor.core.publisher.Mono.block(Mono.java:1707)
at ExceptionTest.testSpan(ExceptionTest.java:22)
at ExceptionTest.main(ExceptionTest.java:13)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:8090
Caused by: java.net.ConnectException: Connection refused: no further information
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
The issue appears to be with these lines:
Error has been observed at the following site(s):
*__checkpoint ⇢ Request to GET null [DefaultWebClient]
Original Stack Trace:
This doesn't seem to be handled by the parser, or mentioned as a possibility in the stacktrace output used to model the parser: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#printStackTrace()
This weird stack trace is a debugging feature in Reactor. It helps to debug in an async execution chain. More details here: https://projectreactor.io/docs/core/release/reference/#_reading_a_stack_trace_in_debug_mode
In this situation, I think it's fair to fall back to
- elastic/kibana#124204
@felixbarny can we close this issue then and avoid specific handling for this edge case?
I think there are two valid courses of action:
- Close it as "wont fix". It's not that it's entirely impossible technically but probably quite an edge case and the fallback of showing the unformatted stack trace seems good enough. If it turns out that this continues to be an issue for people, we can re-evaluate.
- Make the parsing more resilient and anticipate the exception message to have line breaks. I think this is what reactor does. It's a regular exception stack trace with a fancy message that has line breaks. A rule could be that the message contains everything before the first stack frame. A stack frame is identified by the regex
\t+at.
Making it more resilient without implementing a specific handling for Spring Reactive sounds like the right thing to do; let's keep it open then.