Kamon
Kamon copied to clipboard
+ Monix Task and BIO support and Cats IO improvements
- kamon-cats-io: Added a Tracing object with an
operationName
function that allows creating named child spans - kamon-monix: Added a new module that supports
monix-eval
Task
andmonix-bio
IO
effect types
The span nesting does not work as evidenced by the test AbstractCatsEffectInstrumentationSpec.nest spans correctly
that is failing. Any pointers are much appreciated.
The purpose of this PR is to add support for Monix as a separate module, while enhancing the Cats Effect support.
@ivantopo, @dpsoft, do you have any ideas why the spans are not nested correctly if the context propagation is supposed to work (according to the other cats/monix test)? Could you please give me some pointers on how I can further investigate the problem?
Hey man, sorry for the delay, I'll go on a deeper dive here soon!
For now, the test is kinda iffy, but I've been known to be wrong.
If the tests are correct, you most likely need to figure out the correct place to use Kamon.storeContext
!
P.S. you can configure your editor to add newlines at the end of files, so github stops putting red marks everywhere!
Hi so this was an issue I hit a brick wall with when developing https://github.com/jatcwang/kamon-cats-effect
After staring at it for a while with @ivantopo I think we've found the problem.
The gist of the problem is that we're calling scope.close() in a different thread than the original thread where the Scope was created.
See this line
https://github.com/kamon-io/Kamon/blob/f3fbf48d908e1cd8cc4490cfdf5efcfe9f0f046f/core/kamon-core/src/main/scala/kamon/context/Storage.scala#L78-L86
Note how in def close()
, we have a reference to the array thread local. This is fine if everything happens in one thread, but if close()
is called from a different thread, we have disastrous consequences such as context contamination.
The solution (for this problem) is to resolve the thread local again in close()
override def close(): Unit = tls.get()(0) = previousContext
If there are benchmarks we should make sure this doesn't have significant performance impact, since the array trick was originally done for performance reasons. (We should also try removing the array trick altogether if we decide to "fix" this, since at this point it's really not doing anything)
Thanks for this @jatcwang, I'll have a look.
Hi @jatcwang!
If there are benchmarks we should make sure this doesn't have significant performance impact, since the array trick was originally done for performance reasons. (We should also try removing the array trick altogether if we decide to "fix" this, since at this point it's really not doing anything)
Please take a look: https://github.com/kamon-io/Kamon/blob/master/core/kamon-core-bench/src/main/scala/kamon/bench/ThreadLocalStorageBenchmark.scala if you want measure the performance impact.
Ok this one is interesting, the "old" (supposedly) non-optimized version is faster when I run the benchmark. (I didn't run that many iterations).
Some unscientific benchmarks I ran with my laptop (Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 16GB)
On GraalVM:
# JMH version: 1.21
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM GraalVM CE 21.0.0.2, 25.282-b07-jvmci-21.0-b06
# VM invoker: /usr/lib/jvm/java-8-graalvm/jre/bin/java
# VM options: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
# Warmup: 3 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: kamon.bench.ThreadLocalStorageBenchmark.fastThreadLocal
# Run progress: 50.00% complete, ETA 00:01:20
# Fork: 1 of 1
Benchmark Mode Cnt Score Error Units
ThreadLocalStorageBenchmark.currentThreadLocal avgt 5 16.441 ± 4.799 ns/op
ThreadLocalStorageBenchmark.fastThreadLocal avgt 5 13.701 ± 1.391 ns/op
and on OpenJDK:
# JMH version: 1.21
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
# Warmup: 3 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
Benchmark Mode Cnt Score Error Units
ThreadLocalStorageBenchmark.currentThreadLocal avgt 5 17.482 ± 0.717 ns/op
ThreadLocalStorageBenchmark.fastThreadLocal avgt 5 12.808 ± 0.683 ns/op
Note that "currentThreadLocal" is the naive, non-optimized version. So yeah the optimized version is consistently faster but for my use case (Mainly future/IO instead of akka) I'd say the impact is negligible.
Great work @jatcwang! If you think this implementation would be useful to you, feel free to add it. Just add a (well documented :D) setting that lets users choose which implementation they'd like to use, with the current one as default!
@SimunKaracic PR up in #961