apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

processor/otel: Java exceptions thrown by Spring Reactive are not parsed correctly

Open axw opened this issue 3 years ago • 5 comments
trafficstars

APM Server version (apm-server version): 7.15.2

Description of the problem including expected versus actual behavior:

Exceptions thrown in Spring Reactive application code, when captured by OpenTelemetry Java auto instrumentation, are not parsed correctly by the processor/otel.

Steps to reproduce:

See https://discuss.elastic.co/t/stack-trace-is-not-parsed/292179

Sample exception:

org.springframework.web.reactive.function.client.WebClientRequestException: Connection refused: no further information: localhost/127.0.0.1:8090; nested exception is io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:8090
	at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
	Suppressed: The stacktrace has been enhanced by Reactor, refer to additional information below: 
Error has been observed at the following site(s):
	*__checkpoint ⇢ Request to GET null [DefaultWebClient]
Original Stack Trace:
		at org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:141)
		at reactor.core.publisher.MonoErrorSupplied.subscribe(MonoErrorSupplied.java:55)
		at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onError(FluxPeek.java:222)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:204)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:106)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onError(MonoIgnoreThen.java:270)
		at reactor.core.publisher.Operators.error(Operators.java:198)
		at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
		at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
		at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:236)
		at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:203)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onComplete(TracingSubscriber.java:68)
		at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2400)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:138)
		at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onSubscribe(MonoIgnoreThen.java:134)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onSubscribe$0(TracingSubscriber.java:53)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onSubscribe(TracingSubscriber.java:53)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onSubscribe(FluxPeek.java:171)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onSubscribe$0(TracingSubscriber.java:53)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onSubscribe(TracingSubscriber.java:53)
		at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)
		at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
		at reactor.core.publisher.MonoDeferContextual.subscribe(MonoDeferContextual.java:55)
		at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
		at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:255)
		at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
		at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
		at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.whenError(FluxRetryWhen.java:225)
		at reactor.core.publisher.FluxRetryWhen$RetryWhenOtherSubscriber.onError(FluxRetryWhen.java:274)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerError(FluxConcatMap.java:309)
		at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onError(FluxConcatMap.java:873)
		at reactor.core.publisher.Operators.error(Operators.java:198)
		at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
		at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
		at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:449)
		at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:251)
		at reactor.core.publisher.EmitterProcessor.drain(EmitterProcessor.java:491)
		at reactor.core.publisher.EmitterProcessor.tryEmitNext(EmitterProcessor.java:299)
		at reactor.core.publisher.SinkManySerialized.tryEmitNext(SinkManySerialized.java:100)
		at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:27)
		at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onError(FluxRetryWhen.java:190)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:194)
		at reactor.netty.http.client.HttpClientConnect$MonoHttpConnect$ClientTransportSubscriber.onError(HttpClientConnect.java:304)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:194)
		at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onError(DefaultPooledConnectionProvider.java:172)
		at reactor.netty.internal.shaded.reactor.pool.AbstractPool$Borrower.fail(AbstractPool.java:477)
		at reactor.netty.internal.shaded.reactor.pool.SimpleDequePool.lambda$drainLoop$9(SimpleDequePool.java:401)
		at reactor.core.publisher.FluxDoOnEach$DoOnEachSubscriber.onError(FluxDoOnEach.java:186)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:194)
		at reactor.netty.resources.DefaultPooledConnectionProvider$PooledConnectionAllocator$PooledConnectionInitializer.onError(DefaultPooledConnectionProvider.java:554)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:192)
		at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:259)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:106)
		at reactor.core.publisher.Operators.error(Operators.java:198)
		at reactor.core.publisher.MonoError.subscribe(MonoError.java:53)
		at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:103)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxPeekFuseable$PeekFuseableSubscriber.onError(FluxPeekFuseable.java:234)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:142)
		at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:258)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.lambda$onError$2(TracingSubscriber.java:63)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.withActiveSpan(TracingSubscriber.java:79)
		at io.opentelemetry.javaagent.shaded.instrumentation.reactor.TracingSubscriber.onError(TracingSubscriber.java:63)
		at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onError(FluxHide.java:142)
		at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:258)
		at reactor.netty.transport.TransportConnector$MonoChannelPromise.tryFailure(TransportConnector.java:517)
		at reactor.netty.transport.TransportConnector$MonoChannelPromise.setFailure(TransportConnector.java:471)
		at reactor.netty.transport.TransportConnector.lambda$doConnect$7(TransportConnector.java:206)
		at io.opentelemetry.javaagent.instrumentation.netty.common.FutureListenerWrappers$WrappedFutureListener.operationComplete(FutureListenerWrappers.java:106)
		at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
		at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
		at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
		at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
		at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
		at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609)
		at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117)
		at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321)
		at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337)
		at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
		at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
		at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
		at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.base/java.lang.Thread.run(Thread.java:834)
	Suppressed: java.lang.Exception: #block terminated with an error
		at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99)
		at reactor.core.publisher.Mono.block(Mono.java:1707)
		at ExceptionTest.testSpan(ExceptionTest.java:22)
		at ExceptionTest.main(ExceptionTest.java:13)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:8090
Caused by: java.net.ConnectException: Connection refused: no further information
	at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779)
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)

axw avatar Dec 17 '21 09:12 axw

The issue appears to be with these lines:

Error has been observed at the following site(s):
	*__checkpoint ⇢ Request to GET null [DefaultWebClient]
Original Stack Trace:

This doesn't seem to be handled by the parser, or mentioned as a possibility in the stacktrace output used to model the parser: https://docs.oracle.com/javase/7/docs/api/java/lang/Throwable.html#printStackTrace()

stuartnelson3 avatar Dec 29 '21 11:12 stuartnelson3

This weird stack trace is a debugging feature in Reactor. It helps to debug in an async execution chain. More details here: https://projectreactor.io/docs/core/release/reference/#_reading_a_stack_trace_in_debug_mode

In this situation, I think it's fair to fall back to

  • elastic/kibana#124204

felixbarny avatar Feb 01 '22 09:02 felixbarny

@felixbarny can we close this issue then and avoid specific handling for this edge case?

simitt avatar Feb 04 '22 13:02 simitt

I think there are two valid courses of action:

  • Close it as "wont fix". It's not that it's entirely impossible technically but probably quite an edge case and the fallback of showing the unformatted stack trace seems good enough. If it turns out that this continues to be an issue for people, we can re-evaluate.
  • Make the parsing more resilient and anticipate the exception message to have line breaks. I think this is what reactor does. It's a regular exception stack trace with a fancy message that has line breaks. A rule could be that the message contains everything before the first stack frame. A stack frame is identified by the regex \t+at.

felixbarny avatar Feb 04 '22 15:02 felixbarny

Making it more resilient without implementing a specific handling for Spring Reactive sounds like the right thing to do; let's keep it open then.

simitt avatar Feb 08 '22 17:02 simitt