Kamon
Kamon copied to clipboard
UnsupportedOperationException when reading trace context from kafla
UnsupportedOperationException when reading trace context from kafla
running kafka 2.8 with compression
Only occurs when tracing is set to continue
kamon.instrumentation.kafka {
client.tracing {
continue-trace-on-consumer = yes
}
}
Error reading the context from the header.
I think the error is on this line when the readAll can't be invoked on the reader but it could also be any other UnsupportedOperation on the context here: https://github.com/kamon-io/Kamon/blob/master/core/kamon-core/src/main/scala/kamon/context/BinaryPropagation.scala#L170
void java.lang.Throwable.<init>(String) 142486 100 %
void java.lang.Exception.<init>(String) 142486 100 %
void java.lang.RuntimeException.<init>(String) 142486 100 %
void java.lang.UnsupportedOperationException.<init>(String) 142486 100 %
Try scala.util.Success.failed() 142486 100 %
Context kamon.context.BinaryPropagation$Default.read(BinaryPropagation$ByteStreamReader) 142485 100 %
Context kamon.context.BinaryPropagation$Default.read(Object) 142485 100 %
Context kamon.instrumentation.kafka.client.ContextSerializationHelper$.fromByteArray(byte[]) 142485 100 %
Context kamon.instrumentation.kafka.client.RecordProcessor$.$anonfun$process$2(Header) 142485 100 %
Object kamon.instrumentation.kafka.client.RecordProcessor$$$Lambda$1170.322308893.apply(Object) 142485 100 %
Option scala.Option.map(Function1) 142485 100 %
void kamon.instrumentation.kafka.client.RecordProcessor$.$anonfun$process$1(Instant, ConsumedRecordData$ConsumerInfo, ConsumerRecord) 142485 100 %
Object kamon.instrumentation.kafka.client.RecordProcessor$.$anonfun$process$1$adapted(Instant, ConsumedRecordData$ConsumerInfo, ConsumerRecord) 142485 100 %
Object kamon.instrumentation.kafka.client.RecordProcessor$$$Lambda$1169.1580561024.apply(Object) 142485 100 %
Hey @dr3s,
I've been trying to reproduce this issue but no luck so far. I tried with corrupted headers, no headers, different settings, different Kafka versions and all of them worked out.
What additional info could you share to reproduce this issue? and, do you have access to the exception logged here from your logs? This that you have above seems like some sort of aggregate that might be missing some important info :thinking:
This trace was from our profiler. For some reason I don't see the exceptions in the logs and we have the level set to INFO. To give you an idea of volume, I recorded about 1M exceptions per minute. I'll try to just debug locally and capture a trace.
In general though, I think it may be problematic in an APM library to use exceptions in this way. It seems the message we are seeing is from Success.failed
https://github.com/scala/scala/blob/2.13.x/src/library/scala/util/Try.scala#L277
It would be better to log and return a failure type that doesn't wrap an exception as this code will init Throwable and appears to impact performance.
Oh, I see it now!
The exception you are seeing is created by the Success.failed
implementation, but it is never thrown! It means that every time a binary context is deserialized with the binary propagation we are creating an exception (with all the garbage that comes with it) and doing nothing with it :exploding_head:
I'll fix this ASAP.
Thanks for sharing that screenshot @dr3s, everything clicked with it!
#1206 fixes this issue and should be out by the end of this week. Stay tuned for the next release!
Right, I actually do this in my own code and it's probably bad practice. I don't know what the overhead is but Success.failed()
probably shouldn't be doing it either. I wonder if it has changed in Scala 3.
I still don't know where that function is actually being invoked. Is it just because the Try
object is being applied that the failed
function constructs the exception?
It gets called inside the binary propagation here: https://github.com/kamon-io/Kamon/blob/master/core/kamon-core/src/main/scala/kamon/context/BinaryPropagation.scala#L174-L176
The Try
docs state this:
So what is happening is that on the call site above we always have a success, which means .failed
creates a wrapper with an UnsupportedOperationException
that never calls that .foreach
because there is no actual error.