Kamon icon indicating copy to clipboard operation
Kamon copied to clipboard

+ Monix Task and BIO support and Cats IO improvements

Open bogdanromanx opened this issue 3 years ago • 8 comments

  • kamon-cats-io: Added a Tracing object with an operationName function that allows creating named child spans
  • kamon-monix: Added a new module that supports monix-eval Task and monix-bio IO effect types

The span nesting does not work as evidenced by the test AbstractCatsEffectInstrumentationSpec.nest spans correctly that is failing. Any pointers are much appreciated.

The purpose of this PR is to add support for Monix as a separate module, while enhancing the Cats Effect support.

bogdanromanx avatar Oct 20 '20 16:10 bogdanromanx

@ivantopo, @dpsoft, do you have any ideas why the spans are not nested correctly if the context propagation is supposed to work (according to the other cats/monix test)? Could you please give me some pointers on how I can further investigate the problem?

bogdanromanx avatar Oct 23 '20 10:10 bogdanromanx

Hey man, sorry for the delay, I'll go on a deeper dive here soon! For now, the test is kinda iffy, but I've been known to be wrong. If the tests are correct, you most likely need to figure out the correct place to use Kamon.storeContext!

P.S. you can configure your editor to add newlines at the end of files, so github stops putting red marks everywhere!

SimunKaracic avatar Nov 11 '20 16:11 SimunKaracic

Hi so this was an issue I hit a brick wall with when developing https://github.com/jatcwang/kamon-cats-effect

After staring at it for a while with @ivantopo I think we've found the problem.

The gist of the problem is that we're calling scope.close() in a different thread than the original thread where the Scope was created.

See this line https://github.com/kamon-io/Kamon/blob/f3fbf48d908e1cd8cc4490cfdf5efcfe9f0f046f/core/kamon-core/src/main/scala/kamon/context/Storage.scala#L78-L86 Note how in def close(), we have a reference to the array thread local. This is fine if everything happens in one thread, but if close() is called from a different thread, we have disastrous consequences such as context contamination.

The solution (for this problem) is to resolve the thread local again in close()

override def close(): Unit = tls.get()(0) = previousContext

If there are benchmarks we should make sure this doesn't have significant performance impact, since the array trick was originally done for performance reasons. (We should also try removing the array trick altogether if we decide to "fix" this, since at this point it's really not doing anything)

jatcwang avatar Mar 15 '21 23:03 jatcwang

Thanks for this @jatcwang, I'll have a look.

bogdanromanx avatar Mar 17 '21 18:03 bogdanromanx

Hi @jatcwang!

If there are benchmarks we should make sure this doesn't have significant performance impact, since the array trick was originally done for performance reasons. (We should also try removing the array trick altogether if we decide to "fix" this, since at this point it's really not doing anything)

Please take a look: https://github.com/kamon-io/Kamon/blob/master/core/kamon-core-bench/src/main/scala/kamon/bench/ThreadLocalStorageBenchmark.scala if you want measure the performance impact.

dpsoft avatar Mar 17 '21 22:03 dpsoft

Ok this one is interesting, the "old" (supposedly) non-optimized version is faster when I run the benchmark. (I didn't run that many iterations).

Some unscientific benchmarks I ran with my laptop (Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz, 16GB)

On GraalVM:

# JMH version: 1.21
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM GraalVM CE 21.0.0.2, 25.282-b07-jvmci-21.0-b06
# VM invoker: /usr/lib/jvm/java-8-graalvm/jre/bin/java
# VM options: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
# Warmup: 3 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: kamon.bench.ThreadLocalStorageBenchmark.fastThreadLocal
# Run progress: 50.00% complete, ETA 00:01:20
# Fork: 1 of 1

Benchmark                                       Mode  Cnt   Score   Error  Units
ThreadLocalStorageBenchmark.currentThreadLocal  avgt    5  16.441 ± 4.799  ns/op
ThreadLocalStorageBenchmark.fastThreadLocal     avgt    5  13.701 ± 1.391  ns/op

and on OpenJDK:

# JMH version: 1.21
# VM version: JDK 1.8.0_282, OpenJDK 64-Bit Server VM, 25.282-b08
# VM invoker: /usr/lib/jvm/java-8-openjdk/jre/bin/java
# VM options: -XX:MaxInlineLevel=24 -XX:MaxInlineSize=270
# Warmup: 3 iterations, 10 s each
# Measurement: 5 iterations, 10 s each

Benchmark                                       Mode  Cnt   Score   Error  Units
ThreadLocalStorageBenchmark.currentThreadLocal  avgt    5  17.482 ± 0.717  ns/op
ThreadLocalStorageBenchmark.fastThreadLocal     avgt    5  12.808 ± 0.683  ns/op

Note that "currentThreadLocal" is the naive, non-optimized version. So yeah the optimized version is consistently faster but for my use case (Mainly future/IO instead of akka) I'd say the impact is negligible.

jatcwang avatar Mar 18 '21 23:03 jatcwang

Great work @jatcwang! If you think this implementation would be useful to you, feel free to add it. Just add a (well documented :D) setting that lets users choose which implementation they'd like to use, with the current one as default!

SimunKaracic avatar Mar 19 '21 12:03 SimunKaracic

@SimunKaracic PR up in #961

jatcwang avatar Mar 20 '21 13:03 jatcwang