dry-monitor
dry-monitor copied to clipboard
Payload enhancements
I've added a couple features to Dry::Monitor that I missed from AS::Notifications :speak_no_evil:. Details about each are in the commit messages.
Yield the payload to the instrumented block
Allows the instrumented code to add more data to the payload
, useful when needing instrument something involving a response:
instrument("example.request", url: url, body: body) do |payload|
response = http.post(url, params: body)
payload[:response] = response
end
Trigger subscribers even when the instrumented code raises an exception
If the instrumented code block raises an exception, adds that exception to the payload
and re-raises (so the backtrace remains the same). This allows subscribers to log the exception as well.
instrument("example.request") do
client.post("http://url.that.fails.example/") # => ClientError
end
I wanted to add docs for these features, but it seems this gem doesn't have any documentation? Also, it seems like this repo needs the "devtools update" that other repos have gotten, since codeclimate won't build it.
Yield the payload to the instrumented block
This would mutate the payload and we should not go that way. Not only that, when no payload giving, mutating will actually raise an error:
it 'allows modifying the default payload ' do
captured = []
notifications.subscribe(:sql) do |event|
captured << event
end
notifications.instrument(:sql) do |payload|
payload[:modified] = true
end
end
# Effect
1) Subscribing to instrumentation events #instrument allows modifying the default payload
Failure/Error: payload[:modified] = true
FrozenError:
can't modify frozen Hash
# ./spec/integration/instrumentation_spec.rb:19:in `block (4 levels) in <top (required)>'
# ./lib/dry/monitor/notifications.rb:43:in `block in instrument'
# ./lib/dry/monitor/notifications.rb:11:in `measure'
# ./lib/dry/monitor/notifications.rb:43:in `instrument'
# ./spec/integration/instrumentation_spec.rb:18:in `block (3 levels) in <top (required)>'
the reason why this feature was not in the dry-monitor
in the first place, is that it is dangerous. We're not always the direct owners of the payload and there are cases where it comes from ta different bounded context, thus modifying it would have side-effects.
Trigger subscribers even when the instrumented code raises an exception
As far as I appreciate the effort, I'm also against this feature.
First of all, catching the Exception
, even with a re-raise may also catch things like SignalException
thus making it harder to debug certain things (and implement when wrapped with instrumentation).
The second argument against the code modifies the payload (that is commented above).
Third argument: it makes the end-user code harder to develop (from the subscriber perspective) . I Publishing the same event upon error is definetely not something that should happen. I would recommend by design that users publish a separate event for errors / exceptions, thus having a more granular flow.
Not only that, when no payload giving, mutating will actually raise an error:
Right, I mentioned that in my commit message. I didn't want to make the change for the default payload from the frozen EMPTY_HASH
to a new hash every time, because of the (arguably minuscule) performance hit.
We're not always the direct owners of the payload and there are cases where it comes from a different bounded context, thus modifying it would have side-effects.
I'm not clear on who the "we" is in this context? If "we" are dry-monitor, then the first change isn't modifying the payload at all. If "we" are the authors of the code we are instrumenting, then we are the owners of that payload.
First of all, catching the Exception, even with a re-raise may also catch things like SignalException thus making it harder to debug certain things
I'm aware of the issues with catching Exception vs StandardError, and chose it deliberately. I was also careful to re-raise the exception with Ruby's bare raise
feature, which won't result in a different backtrace. If I'm instrumenting some bit of code in a long-running daemon process that sends a message over the network, and the daemon is restarted, I should still instrument that the network call happened.
My actual use-case is trying to instrument an HTTP client that I don't control, and send the timing and response information to InfluxDB. Here's a simplified version of that code:
instrument("client.write", url: url, body: body) do |payload|
response = client.write(url, body) # raises ClientError on any non-2xx status
payload[:response] = response
response
rescue ClientError => ex
payload[:response] = ex.response
exception_notifier.capture(ex)
raise
end
My subscriber needs access to the response object because I want to capture the response status code in my dashboard graphs. With no access to the payload inside the block, I can't add the response status to it. Also, if the status code is 4xx or 5xx, the upstream client code raises an exception, and in the current implementation of dry-monitor, none of my subscribers are called.
I can sorta hack around the lack of the first feature by doing this:
payload = {url: url, body: body}
instrument("client.write", payload) do
response = client.write(url, body)
payload[:response] = response
response
rescue ClientError => ex
payload[:response] = ex.response # Doesn't work because `payload` isn't in scope here
end
However, if the block ever raises an exception (and I want that exception to bubble through), I have to write instrumented code like this (I think, I haven't actually tried it).
payload = {url: url, body: body}
exception = nil
instrument("client.write", payload) do
begin
response = client.write(url, body)
payload[:response] = response
response
rescue ClientError => ex
payload[:response] = ex.response # Still not sure if `payload` is in scope
exception = ex # Of if exception is as well
end
end
raise exception # Now I've also lost the original backtrace
Even if it does work, this code is far more complicated and error prone for the user.
I urge you to reconsider, these features are absolutely critical for anyone looking for an AS::Notifications replacement in non-Rails applications. I would completely agree with you if supporting these features made the notifications code way more complicated, but this is a fairly trivial amount of code to unlock some very powerful functionality.
I've rebased this branch with the devtools changes on master.
In general, we're looking at two things here:
- Support for adding more data to the payload within instrumentation blocks
- Support for rescuing and re-raising exceptions
That's why I'd appreciate if this PR could be split into two. I also think that exception handling could be implemented as either a plugin or configurable feature. I'm not sure yet if handling exceptions like in this PR by default is a good idea, especially that you'd like to handle Exception
which is usually risky business.
@paul does this make sense?
I wanted to add docs for these features, but it seems this gem doesn't have any documentation?
Yeah dry-monitor is still considered "too beta". We should be able to finally make it stable some time in 2020, then proper docs will be written.
Quick follow-up after talking to @mensfeld about it - he's got a very good point that handling exceptions within the same instrumentation block is an anti-pattern and it makes more sense to do something like this.
I gotta say my brain is not entirely "in the zone" here so I need some time to think about it further. Let's put this on hold for now and I'll get back to you with more thoughts later this month. We can of course continue discussion in the meantime.
@solnic Sure, I can split this into two PRs, no problem.
My reasoning behind rescuing Exception
rather than StandardError
is that in the situation where you're instrumenting some code that makes a network request, and the process gets killed by ^C (SystemExit
) or some other Interrupt
that inherits from Exception
rather than StandardError
, you probably still want to instrument that it happened. Depending on the network request and when it was interrupted, the "change" may have happened on the remote but if there's no corresponding instrumentation event fired.
It looks like @josevalim wrote it into AS::Notifications 10 years ago, but the commit doesn't explain why. I'll see if I can ask him if he remembers. https://github.com/rails/rails/commit/a76c7e68d5e39f5962d9cb85c98e6a8e96f8b3af
Edited to add: I think this makes the most sense if you're instrumenting something that has a log subscriber. Even if the process gets killed before the instrumented block finished, the logger should probably still get called to log the event. I suppose it really depends on what you're using dry-monitor for, if its for "logging and instrumentation" like AS::Notifications, or if its an event bus like Wisper. My use-case was logging and shipping timing metrics to influxdb which I want to happen even if my server dies.
I was fooling around with dry-system and enabled monitoring. I did a few tests with a UserRepo and actually expected the exception to be caught and the subscriber called, with an exception
key or something similar. I also expected the exception to bubble up at the top-level so that it could be caught by callers. Monitoring should be transparent to callers, that I fully agree with.
I did learn about rescuing exceptions within blocks, so thank you @paul for that new bit of knowledge!
@solnic I believe you meant to highlight this block of code? https://github.com/karafka/karafka/blob/c68e0b0462d016ea04e92e88e728d66b5aef7f42/lib/karafka/params/params.rb#L68-L76
Karafka's master changed since the time you created you link and I was confused for a bit because the lines that are highlighted today don't even have any exception handling code.
oh boy, I completely lost track on this one, sorry about that! I'll be working on dry-monitor 1.0.0 for hanami 2.0.0 release, so this will be a good moment for me to revisit this. We'll for sure take this library to the next level, so feedback like the one here is really appreciated.
For now I'm assigning this PR to 1.0.0
milestone.
@solnic I believe you meant to highlight this block of code? https://github.com/karafka/karafka/blob/c68e0b0462d016ea04e92e88e728d66b5aef7f42/lib/karafka/params/params.rb#L68-L76
@francois I don't even remember now, but most likely, yes.