apm-agent-ruby icon indicating copy to clipboard operation
apm-agent-ruby copied to clipboard

Puma Instrumentation

Open jclusso opened this issue 3 years ago • 20 comments

Is there any plan to add Puma instrumentation so we can measure things like queue request time. This is pretty standard in almost every other APM by just adding a header in NGINX.

jclusso avatar Nov 18 '21 00:11 jclusso

@mikker any idea when this will come?

jclusso avatar May 05 '22 16:05 jclusso

I no longer work for Elastic on the agent. Ping @estolfo (hope you're well!)

mikker avatar May 06 '22 10:05 mikker

Hi @jclusso, thanks for following-up. We don't have this planned now but I'll update this issue when we do add it to our roadmap.

estolfo avatar May 11 '22 09:05 estolfo

Just to update with an approximate timeline, I think we can fit this into the next quarter or so. Will update in July if that doesn't turn out to be the case 👍

jaggederest avatar May 18 '22 02:05 jaggederest

@jaggederest that would be awesome! Figured this would be worth sharing. AppSignal does it by having you add the following to your NGINX config.

proxy_set_header X-Request-Start "t=${msec}";

https://docs.appsignal.com/ruby/instrumentation/request-queue-time.html#nginx

jclusso avatar May 18 '22 14:05 jclusso

@jaggederest I was wondering if in the meantime I could do this myself with custom middleware. I saw someone do it with datadog and some simple middleware. The one thing I don't know and can't figure out is how to report something to the APM as part of the request. I know how to do custom instrumentation but is there any documentation on how to just report something taking a certain amount of time on the request? That is the last piece of the puzzle. Would also be happy to make a pull request if I can get it working.

jclusso avatar May 20 '22 14:05 jclusso

Honestly I'm in the same boat getting familiar with this project. I've implemented middleware for this kind of timing before but I'm not familiar enough with Elastic's internal reporting format to say. I think it will be in the spirit of a Span or Transaction adjustment. I encourage you to take a crack at it and see what you discover, but don't spin your wheels for too long if you get stuck. Very happy to accept PRs, even half baked ones are much appreciated, we can work through it if you get part of the way.

jaggederest avatar May 20 '22 21:05 jaggederest

Been playing around with something like This. I got this from someone's example of one for Datadog. I'm not sure if I should make it it's own transaction or span. What I have realized is I think the Span class needs to be modified to allow you to set duration unless I've missed something. Gonna keep playing around with this.

class QueueTimeMiddleware

  def initialize(app)
    @app = app
  end

  def call(env)
    start = env['HTTP_X_REQUEST_START'].to_f
    wait = env['puma.request_body_wait']
    current = Time.now.to_f

    ElasticAPM.start_span 'puma', 'queue_time'
    span = ElasticAPM.current_span
    duration = (current - wait - start).to_i
    span.duration = duration
    ElasticAPM.end_span

    @status, @headers, @response = @app.call(env)
    [@status, @headers, self]
  end

  def each(&block)
    @response.each(&block)
  end

end

jclusso avatar May 20 '22 22:05 jclusso

Not that I strictly recommend this, but there's always instance_variable_set or instance_eval to dig into the span duration. You'll need to do that directly after the end_span call, and it may result in strange things in the UI, so very experimental. Over time I will probably add a method to allow you to pass in a historical start to start_span but I don't have firm ideas about what that will look like at the moment. Thanks for playing with it, this will be useful feedback.

jaggederest avatar May 23 '22 13:05 jaggederest

@jaggederest I almost feel like you just need another method entirely that is called like track_span that takes duration. It seems odd that you'd allow someone to start / end a span in future where you can then change the start time. I also imagine there are other good use cases for just being able to submit time for an already measured span.

jclusso avatar May 23 '22 21:05 jclusso

Development note:

  • [ ] Make sure to add tests for breakdown metrics to ensure that adding queue time "before" the actual transaction start does not skew or alter the breakdown within the transaction

jaggederest avatar May 24 '22 16:05 jaggederest

+1 for this feature

steve21168 avatar Oct 03 '22 21:10 steve21168

@jaggederest curious when this will be available since it says it's for 8.4 and we're on 8.5.2 now.

jclusso avatar Dec 06 '22 17:12 jclusso

Hi @jclusso we don't have this planned right now but looking at your comment here, if you'd like to open a pull request, I'm happy to review it and work with you on it.

estolfo avatar Dec 07 '22 09:12 estolfo

@estolfo I'm not really sure where it belongs and there is nothing really like it to model off of. Can you advise? We currently have this working where we just send it to StatsD. I'm just not sure how to integrate this.

class PumaLatencyMiddleware

  def initialize(app)
    @app = app
  end

  def call(env)
    start   = env['HTTP_X_REQUEST_START']&.sub('t=', '').to_f * 1000
    wait    = env['puma.request_body_wait']
    current = Time.now.to_f * 1000

    StatsD.histogram(
      'puma.queue_time',
      (current - wait - start).to_i
    ) unless start == 0

    @app.call(env)
  end

end

jclusso avatar Dec 07 '22 17:12 jclusso

Hey, checking in on this again regarding my last comment.

jclusso avatar Apr 07 '23 01:04 jclusso

Up! Queue time is one of the most important metrics to check currently servers capacity. Is there a way to report a fixed time in a span?

niltonvasques avatar Jun 18 '23 10:06 niltonvasques

@jclusso I figured out a way to register the queue time.

Maybe it could also be useful for you: https://github.com/niltonvasques/apm-agent-ruby/commit/c66819bd21d92755037213f5ef0a269ecc7e2a9c

I will not open a pull request for that, because this strategy seems way too specific. Here in the gem a more general solution would makes more sense.

image

image

The only thing that I was not able to display, it is the internal transaction chart. For some reason the span type is not showed there and I don't know why:

image

niltonvasques avatar Jun 19 '23 15:06 niltonvasques

@niltonvasques this looks awesome. It seems like it's grouping it in the controller on the Time spent by span chart based on the color. I'm going to dig into this a little more and see if I can't take what you've done and get a PR working for this.

jclusso avatar Jun 20 '23 11:06 jclusso

@estolfo / @jaggederest was wondering if there is any news on support for this or a timeline? While the code @niltonvasques wrote works, I don't see how his implementation could be used to build latency alerts and the like.

jclusso avatar Oct 27 '23 17:10 jclusso