Kamon icon indicating copy to clipboard operation
Kamon copied to clipboard

+ kamon-armeria module

Open lucasamoroso opened this issue 4 years ago • 29 comments

Armeria http server and client instrumentation.

I tested it manually and added tests based on okhttp and akka-http instrumentation modules tests.

Still there is work to do like the https/grpc server instrumention, but i think this PR is a good Armeria module starting point.

Thanks to @dpsoft for his help :)

lucasamoroso avatar Sep 13 '20 03:09 lucasamoroso

I tried this out with https://github.com/line/armeria-examples/tree/master/annotated-http-service The example breaks, and I can't get the /docs endpoint. Stacktrace:

java.lang.NullPointerException
        at kamon.armeria.instrumentation.server.ServeMethodAdvisor$.around(ArmeriaHttpServerInstrumentation.scala:79)
        at com.linecorp.armeria.server.FallbackService.serve(FallbackService.java:40)
        at com.linecorp.armeria.server.HttpServerHandler.handleRequest(HttpServerHandler.java:384)
        at com.linecorp.armeria.server.HttpServerHandler.channelRead(HttpServerHandler.java:249)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at com.linecorp.armeria.server.ArmeriaHttpServerRequestHandler.channelRead(ArmeriaHttpServerRequestHandler.scala:36)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at com.linecorp.armeria.server.Http1RequestDecoder.channelRead(Http1RequestDecoder.java:215)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
        at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
        at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:253)
        at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
        at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440)
        at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

I'll try out the other examples as well, and report back.

SimunKaracic avatar Oct 07 '20 09:10 SimunKaracic

Hey @lucasamoroso,

I took a deeper look at this PR today and found three things that are important to address:

  • Handling the context scope properly. I noticed that the ArmeriaHttpServerRequestHandler is storing the incoming Context instance, and then it gets cleared out on ArmeriaHttpServerResponseHandler, but as far as I understand, there is no guarantee that those two calls will happen on the same thread. I started scratching the surface of Armeria's threading model and so far, I can't say for sure that the behavior in this PR is appropriate. My guess is that there is a separation between Armeria's threads (running the event loops and all Armeria-specific code) and the threads that will actually run the user code (i.e. the service we are implementing), but I will need more research and testing to be sure.
  • Attaching Kamon's Context to Armeria's Channels. I'm not entirely sure if that is a good idea, specially because of the previous point. Also, I noticed that Armeria's RequestContext can carry additional attributes and that makes me wonder: is there any particular reason why the Kamon context is attached to the channels rather than Armeria's RequestContext?
  • Operation Naming. We should probably be moving to a more gRPC-like naming scheme, where the operation names (and endpoints) look more like ServiceClassName.methodName rather than using the HTTP path.

Now, we love Armeria and we want this thing merged and ready to roll! So here is what we will be doing in the next couple days:

  • Try to understand the threading model inside Armeria and make sure that we are not leaving dirty threads nor opening race conditions with Kamon's Context propagation.
  • Run all Armeria examples with this instrumentation and uncover any issues that might come up (@SimunKaracic already dropped some of that above).
  • Investigate further how the Prometheus and Zipkin integrations are working with naming and metering. There might be a couple interesting details to learn from there!

Looking forward to your ideas and feedback :smiley:

ivantopo avatar Oct 07 '20 12:10 ivantopo

@ivantopo,@SimunKaracic Thanks for the review!

I'll be doing some testing with Armeria examples, for now i've no problems with the Operation Name refactor but i need to take a closer look at the other comments.
I'll working on this in the next couple of days!

lucasamoroso avatar Oct 07 '20 23:10 lucasamoroso

Hi! In the last couple of days, with @dpsoft help, we've been changing the current implementation. I've been doing this in another repo because is simple to me to do some manual testing the repo is https://github.com/lucasamoroso/kamon-armeria-test. I'am currently doing manual testing with this armeria-examples taken from Armeria. Until now everithing is working fine so as soon as i finish this manual testing i'll push the changes here and i think with this changes we'll be able to work with grpc and https (but i've to test it).

lucasamoroso avatar Oct 21 '20 16:10 lucasamoroso

@ivantopo about:

  • ** Operation Naming **. We should probably be moving to a more gRPC-like naming scheme, where the operation names (and endpoints) look more like ServiceClassName.methodName rather than using the HTTP path.

I'm not sure about moving to a gRPC-like naming scheme for all Armeria services. Is really common practice have gRPC services mixed with plain http services in an Armeria application. I think that we need other strategy for gRPC naming services in addition to the current one.

dpsoft avatar Oct 21 '20 17:10 dpsoft

Hi!
I pushed a reimplementation of this module, i think is a better approach. I've been doing manual testing with armeria-examples no big issue found.
In the next couple of days i'll work with these things:

  • There is a problem with some armeria-examples and the default unhandled operation name, i left a comment in the code.
  • gRPC and https, i think with this new implementation it should be easy to work with this protocols (until now i only tested http).

@ivantopo About this:

Investigate further how the Prometheus and Zipkin integrations are working with naming and metering. There might be a couple interesting details to learn from there!

I haven't had time to investigate yet but I will.

Looking for a feedback about the work done and if there is something more to do in this PR 😄

lucasamoroso avatar Oct 23 '20 00:10 lucasamoroso

1. You are storing the context in 4 different places, but never closing the returning scope. Won't that leave dirty threads around?

The contexts are stored inside a try-with-resources, i understand that the returned scope will be closed after the execution of the try block.

2. You are setting the `CLIENT_TRACE_SCOPE_KEY` in Armerias context, but not reading it from anywhere?

I'm using this in here https://github.com/kamon-io/Kamon/pull/854/files#diff-051d01483bf35bea603907712f76837cc9fd2e519e321daa3bb86229753865dfR49 but i'm not sure it's needed. For sure we need to propagate the Kamon context before call unwrap(), like this:

   try (Storage.Scope scope = Kamon.storeContext(Kamon.currentContext())) {
      ctx.setAttr(CLIENT_TRACE_SCOPE_KEY, scope);
      return unwrap().execute(ctx, requestHandler.request());
    } catch (Exception exception) {
      requestHandler.span().fail(exception.getMessage(), exception).finish();
      throw exception;
    }

But when the request is complete:

    ctx.log()
        .whenComplete()
        .thenAccept(log -> {
          try (Storage.Scope ignored = Kamon.storeContext(ctx.attr(CLIENT_TRACE_SCOPE_KEY).context())) {
            Timing.takeTimings(log, requestHandler.span());
            requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));
          }
        });

I'm not sure if it's necessary to run

     Timing.takeTimings(log, requestHandler.span());
     requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));

with Kamon context.

As an example, that I hope clarifies what I am trying to say, if i ran this code:

    ctx.log()
        .whenComplete()
        .thenAccept(log -> {
          try (Storage.Scope ignored = Kamon.storeContext(ctx.attr(CLIENT_TRACE_SCOPE_KEY).context())) {
            logger.info("processing kamon context " + Kamon.currentContext().hashCode());
            Timing.takeTimings(log, requestHandler.span());
            requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));
          }
        });

The log line will show that the Kamon context is the same that we've configured in the Armerias context. But if i ran this code

    ctx.log()
        .whenComplete()
        .thenAccept(log -> {
            logger.info("processing kamon context " + Kamon.currentContext().hashCode());
            Timing.takeTimings(log, requestHandler.span());
            requestHandler.processResponse(KamonArmeriaMessageConverter.toKamonResponse(log));
        });

The log line will show that the Kamon context is a different from what we've configured in the Armerias context.

In this case the requestHandler was created with a Kamon context final HttpClientInstrumentation.RequestHandler<HttpRequest> requestHandler = clientInstrumentation.createHandler(KamonArmeriaMessageConverter.getRequestBuilder(req), Kamon.currentContext()); so i'm not entirly sure if that context will'be used always, no matter which Kamon context we're running in always that requestHandler's context will be used, so this try (Storage.Scope ignored = Kamon.storeContext(ctx.attr(CLIENT_TRACE_SCOPE_KEY).context())) inside thenAccept will be unnecessary. In the the server decorator i've the same question, i hope someone with more experience with Kamon can help me.

lucasamoroso avatar Nov 09 '20 22:11 lucasamoroso

Hello!! I think this pr is ready, I have been making a lot of changes in the last time because the first approach I gave to the instrumentation did not convince us much. I think this new approach is much better and it helped us add grpc and https in a very simple way, I just have a doubt regarding the propagation of the Kamon context that I left in one of the comments in response to @SimunKaracic

lucasamoroso avatar Nov 22 '20 14:11 lucasamoroso

Hey @lucasamoroso, I wanted to get into this PR and tried to rebase it to master locally but it didn't go well. There are some minor changes to tests due to the Scalatest version upgrade, but after finishing those changes the tests are not passing anymore :cry:

Do you have time to try rebasing and pushing the updates? Otherwise let me know and I can move it forward :muscle:

ivantopo avatar Feb 15 '22 13:02 ivantopo

Hi @ivantopo! Sure I'll be working on this

lucasamoroso avatar Feb 15 '22 13:02 lucasamoroso

Hey @ivantopo!

I did a rebase and fixed the tests. The problem with the tests was that they must extend from InitAndStopKamonAfterAll to start and stop Kamon.

lucasamoroso avatar Feb 18 '22 12:02 lucasamoroso

Hey @lucasamoroso, thanks for the update!

Generally speaking it looks fine and tests pass locally, but I couldn't make it work with one of my Armeria applications by publishing locally and trying it out :shrug:

I spent some time trying to debug it and figured out that the ArmeriaServerBuilderAdvisor code is not really running locally (tried both using -javaagent and with the bundle). I did some more digging and realized that if I disable the bridge on ServerBuilder then the instrumentation gets picked up:

class ArmeriaHttpServerInstrumentation extends InstrumentationBuilder {
  onType("com.linecorp.armeria.server.ServerBuilder")
    .advise(method("build"), classOf[ArmeriaServerBuilderAdvisor])
//    .bridge(classOf[ServerBuilderInternalState])

  onType("com.linecorp.armeria.server.FallbackService")
    .advise(method("handleNotFound"), classOf[HandleNotFoundMethodAdvisor])
}

But then I had to patch the server ports for it not to blow up. My guess is that when the bridge is in place the entire ServerBuilder class gets loaded before the advise is applied. The only difference I found is that all Kamon tests are running with Java 8, but the app I'm running is using Java 17. Also tried with Java 11 and it didn't work either.

Have you tried it with newer java versions?

ivantopo avatar Feb 23 '22 10:02 ivantopo

Hi @ivantopo !

I've tried with all java versions but the problem was with the attach mode. This commit should fix that issue.

I'll stay tuned for comments 😄

lucasamoroso avatar Feb 25 '22 14:02 lucasamoroso

So what is the status of armeria module ?

yarosman avatar Apr 03 '22 23:04 yarosman

So what is the status of armeria module ?

Hi @yarosman!

I've resolved all @ivantopo's comments and I'm waiting for more comments or maybe this can be merged.

Are you interested in using this module?

lucasamoroso avatar Apr 04 '22 22:04 lucasamoroso

So what is the status of armeria module ?

Hi @yarosman!

I've resolved all @ivantopo's comments and I'm waiting for more comments or maybe this can be merged.

Are you interested in using this module?

Yeap, I wanna try it. @lucasamoroso, did you publish any snapshot version ?

yarosman avatar Apr 11 '22 17:04 yarosman

And I see there is v1.15.0 available

yarosman avatar Apr 11 '22 18:04 yarosman

And I see there is v1.15.0 available

I've updated armeria dependency and rebase with the upstream

lucasamoroso avatar Apr 11 '22 23:04 lucasamoroso

@lucasamoroso awesome!

Hello. Are you going to add this module? It's look like a good replace for akka-http in tapir )

yarosman avatar Sep 09 '22 09:09 yarosman

@yarosman @ivantopo is finishing some tests and if everything goes well, we will be publishing the module soon.

dpsoft avatar Sep 09 '22 18:09 dpsoft

Hi @ivantopo!

I'll try to address your comments this week.

About

Armeria has its own context propagation mechanisms (see the makeContextAware variants in Armeria's RequestContext for more details). Should we try to propagate Kamon's context on those places?

I think we should try to propagate Kamon's context on those places but I'm not sure how much work will be required. I'm thinking to do this in another PR (if it takes a lot of work). Let me dig into it after addressing the other comments

lucasamoroso avatar Sep 13 '22 22:09 lucasamoroso

Hello. @ivantopo you wrote that you migrate several services to armeria. Are you going to add kamon supporting ?

yarosman avatar Dec 17 '22 14:12 yarosman

Hey @yarosman. Yes! I'm going to make it my new year's resolution to get this PR merged and released ASAP!

We indeed have almost all of our services on Armeria now, but we did manual instrumentation on them long ago so kind of lost the urgency to do this for our own use case. But now is the time :pray: (I might have said that before, hopefully this will REALLY be the time!)

ivantopo avatar Jan 10 '23 11:01 ivantopo

We indeed have almost all of our services on Armeria now

@ivantopo, how about adding you to the adopters list and get some goods, then? 😉 https://github.com/line/armeria/discussions/4626

trustin avatar Mar 20 '23 08:03 trustin

@ivantopo Hi! Any ETA to merge PR? Friendly ping

oskoi avatar Apr 10 '23 10:04 oskoi

Commented some comments. Lot went on since then - if anybody has some context on those places, would appreciate it - I don't use armeria

hughsimpson avatar Oct 22 '23 22:10 hughsimpson

Commented some comments. Lot went on since then - if anybody has some context on those places, would appreciate it - I don't use armeria

Sadly I don't remember where are we right now, I stopped using Armeria about 2/3 years ago, because I left the company where I was using it, so I also lost track of the new features that Armeria added that could help us to resolve those issues.

If we want this feature I could help updating the library and trying to understand where are we right now.

lucasamoroso avatar Oct 23 '23 16:10 lucasamoroso