dry-monitor icon indicating copy to clipboard operation
dry-monitor copied to clipboard

Payload enhancements

Open paul opened this issue 4 years ago • 8 comments

I've added a couple features to Dry::Monitor that I missed from AS::Notifications :speak_no_evil:. Details about each are in the commit messages.

Yield the payload to the instrumented block

Allows the instrumented code to add more data to the payload, useful when needing instrument something involving a response:

    instrument("example.request", url: url, body: body) do |payload|
      response = http.post(url, params: body)
      payload[:response] = response
    end

Trigger subscribers even when the instrumented code raises an exception

If the instrumented code block raises an exception, adds that exception to the payload and re-raises (so the backtrace remains the same). This allows subscribers to log the exception as well.

    instrument("example.request") do
      client.post("http://url.that.fails.example/") # => ClientError
    end

I wanted to add docs for these features, but it seems this gem doesn't have any documentation? Also, it seems like this repo needs the "devtools update" that other repos have gotten, since codeclimate won't build it.

paul avatar Nov 30 '19 06:11 paul

Yield the payload to the instrumented block

This would mutate the payload and we should not go that way. Not only that, when no payload giving, mutating will actually raise an error:

    it 'allows modifying the default payload ' do
      captured = []

      notifications.subscribe(:sql) do |event|
        captured << event
      end

      notifications.instrument(:sql) do |payload|
        payload[:modified] = true
      end
    end

  # Effect


  1) Subscribing to instrumentation events #instrument allows modifying the default payload 
     Failure/Error: payload[:modified] = true
     
     FrozenError:
       can't modify frozen Hash
     # ./spec/integration/instrumentation_spec.rb:19:in `block (4 levels) in <top (required)>'
     # ./lib/dry/monitor/notifications.rb:43:in `block in instrument'
     # ./lib/dry/monitor/notifications.rb:11:in `measure'
     # ./lib/dry/monitor/notifications.rb:43:in `instrument'
     # ./spec/integration/instrumentation_spec.rb:18:in `block (3 levels) in <top (required)>'

the reason why this feature was not in the dry-monitor in the first place, is that it is dangerous. We're not always the direct owners of the payload and there are cases where it comes from ta different bounded context, thus modifying it would have side-effects.

Trigger subscribers even when the instrumented code raises an exception

As far as I appreciate the effort, I'm also against this feature.

First of all, catching the Exception, even with a re-raise may also catch things like SignalException thus making it harder to debug certain things (and implement when wrapped with instrumentation).

The second argument against the code modifies the payload (that is commented above).

Third argument: it makes the end-user code harder to develop (from the subscriber perspective) . I Publishing the same event upon error is definetely not something that should happen. I would recommend by design that users publish a separate event for errors / exceptions, thus having a more granular flow.

mensfeld avatar Nov 30 '19 12:11 mensfeld

Not only that, when no payload giving, mutating will actually raise an error:

Right, I mentioned that in my commit message. I didn't want to make the change for the default payload from the frozen EMPTY_HASH to a new hash every time, because of the (arguably minuscule) performance hit.

We're not always the direct owners of the payload and there are cases where it comes from a different bounded context, thus modifying it would have side-effects.

I'm not clear on who the "we" is in this context? If "we" are dry-monitor, then the first change isn't modifying the payload at all. If "we" are the authors of the code we are instrumenting, then we are the owners of that payload.

First of all, catching the Exception, even with a re-raise may also catch things like SignalException thus making it harder to debug certain things

I'm aware of the issues with catching Exception vs StandardError, and chose it deliberately. I was also careful to re-raise the exception with Ruby's bare raise feature, which won't result in a different backtrace. If I'm instrumenting some bit of code in a long-running daemon process that sends a message over the network, and the daemon is restarted, I should still instrument that the network call happened.

My actual use-case is trying to instrument an HTTP client that I don't control, and send the timing and response information to InfluxDB. Here's a simplified version of that code:

instrument("client.write", url: url, body: body) do |payload|
  response = client.write(url, body) # raises ClientError on any non-2xx status
  payload[:response] = response
  response
rescue ClientError => ex
  payload[:response] = ex.response
  exception_notifier.capture(ex)
  raise
end

My subscriber needs access to the response object because I want to capture the response status code in my dashboard graphs. With no access to the payload inside the block, I can't add the response status to it. Also, if the status code is 4xx or 5xx, the upstream client code raises an exception, and in the current implementation of dry-monitor, none of my subscribers are called.

I can sorta hack around the lack of the first feature by doing this:

payload = {url: url, body: body}
instrument("client.write", payload) do
  response = client.write(url, body)
  payload[:response] = response
  response
rescue ClientError => ex
  payload[:response] = ex.response # Doesn't work because `payload` isn't in scope here
end

However, if the block ever raises an exception (and I want that exception to bubble through), I have to write instrumented code like this (I think, I haven't actually tried it).

payload = {url: url, body: body}
exception = nil
instrument("client.write", payload) do
  begin
    response = client.write(url, body)
    payload[:response] = response
    response
  rescue ClientError => ex
    payload[:response] = ex.response # Still not sure if `payload` is in scope
    exception = ex # Of if exception is as well
  end
end
raise exception # Now I've also lost the original backtrace

Even if it does work, this code is far more complicated and error prone for the user.

I urge you to reconsider, these features are absolutely critical for anyone looking for an AS::Notifications replacement in non-Rails applications. I would completely agree with you if supporting these features made the notifications code way more complicated, but this is a fairly trivial amount of code to unlock some very powerful functionality.

paul avatar Nov 30 '19 16:11 paul

I've rebased this branch with the devtools changes on master.

paul avatar Nov 30 '19 16:11 paul

In general, we're looking at two things here:

  1. Support for adding more data to the payload within instrumentation blocks
  2. Support for rescuing and re-raising exceptions

That's why I'd appreciate if this PR could be split into two. I also think that exception handling could be implemented as either a plugin or configurable feature. I'm not sure yet if handling exceptions like in this PR by default is a good idea, especially that you'd like to handle Exception which is usually risky business.

@paul does this make sense?

I wanted to add docs for these features, but it seems this gem doesn't have any documentation?

Yeah dry-monitor is still considered "too beta". We should be able to finally make it stable some time in 2020, then proper docs will be written.

solnic avatar Dec 06 '19 10:12 solnic

Quick follow-up after talking to @mensfeld about it - he's got a very good point that handling exceptions within the same instrumentation block is an anti-pattern and it makes more sense to do something like this.

I gotta say my brain is not entirely "in the zone" here so I need some time to think about it further. Let's put this on hold for now and I'll get back to you with more thoughts later this month. We can of course continue discussion in the meantime.

solnic avatar Dec 06 '19 10:12 solnic

@solnic Sure, I can split this into two PRs, no problem.

My reasoning behind rescuing Exception rather than StandardError is that in the situation where you're instrumenting some code that makes a network request, and the process gets killed by ^C (SystemExit) or some other Interrupt that inherits from Exception rather than StandardError, you probably still want to instrument that it happened. Depending on the network request and when it was interrupted, the "change" may have happened on the remote but if there's no corresponding instrumentation event fired.

It looks like @josevalim wrote it into AS::Notifications 10 years ago, but the commit doesn't explain why. I'll see if I can ask him if he remembers. https://github.com/rails/rails/commit/a76c7e68d5e39f5962d9cb85c98e6a8e96f8b3af

Edited to add: I think this makes the most sense if you're instrumenting something that has a log subscriber. Even if the process gets killed before the instrumented block finished, the logger should probably still get called to log the event. I suppose it really depends on what you're using dry-monitor for, if its for "logging and instrumentation" like AS::Notifications, or if its an event bus like Wisper. My use-case was logging and shipping timing metrics to influxdb which I want to happen even if my server dies.

paul avatar Dec 06 '19 16:12 paul

I was fooling around with dry-system and enabled monitoring. I did a few tests with a UserRepo and actually expected the exception to be caught and the subscriber called, with an exception key or something similar. I also expected the exception to bubble up at the top-level so that it could be caught by callers. Monitoring should be transparent to callers, that I fully agree with.

I did learn about rescuing exceptions within blocks, so thank you @paul for that new bit of knowledge!

@solnic I believe you meant to highlight this block of code? https://github.com/karafka/karafka/blob/c68e0b0462d016ea04e92e88e728d66b5aef7f42/lib/karafka/params/params.rb#L68-L76

Karafka's master changed since the time you created you link and I was confused for a bit because the lines that are highlighted today don't even have any exception handling code.

francois avatar Aug 05 '20 14:08 francois

oh boy, I completely lost track on this one, sorry about that! I'll be working on dry-monitor 1.0.0 for hanami 2.0.0 release, so this will be a good moment for me to revisit this. We'll for sure take this library to the next level, so feedback like the one here is really appreciated.

For now I'm assigning this PR to 1.0.0 milestone.

@solnic I believe you meant to highlight this block of code? https://github.com/karafka/karafka/blob/c68e0b0462d016ea04e92e88e728d66b5aef7f42/lib/karafka/params/params.rb#L68-L76

@francois I don't even remember now, but most likely, yes.

solnic avatar Aug 06 '20 08:08 solnic