req
req copied to clipboard
Telemetry events
This issue is a starting point to discuss what the API for Telemetry events (emitted by Req) will look like.
An initial sketch would be to have default events like this, resembling a telemetry:span/3
.
-
[:req, :request, :pipeline, :start]
-
[:req, :request, :pipeline, :stop]
-
[:req, :request, :pipeline, :exception]
These would measure the entire request pipeline.
I would love if we could also customize the prefix, something like this:
Req.new(
# ...,
telemetry: [
prefix: [:my_app, :stripe_client]
]
)
Resulting in events such as [:my_app, :stripe_client, :pipeline, ...]
.
Steps
At some point we might want to instrument steps, such as JSON decoding/encoding, but I’m not sure this is critical to get this out the door. Adding Telemetry events is not a breaking change, so this is not a blocker for 1.0.
cc @wojtekmach
One important consideration: retries. We should think of how to instrument those, because they should not be instrumented like the whole request pipeline. Otherwise, you might end up with requests that look like they took seconds because they were retried a few times. Just putting this out there.
I think you need at least two telemetry IMO,
-
[:req, request, *]
- where such telemetry is done by middleware or by ensuring it runs at the very end of the pipeline. It would be the same as https://github.com/elixir-tesla/tesla/blob/master/lib/tesla/middleware/telemetry.ex -
[:req, :pipeline, *]
- it is about the middleware pipeline without including the last step, the request telemetry step that I mentioned before (pending in Tesla, but thought on doing this)
May I add: we definitely need to have the url
template with path_params
as part of the metadata, so e.g. we can get useful metrics time series. I heavily use this in Tesla (with PathParams
and KeepRequest
middleware).
Agreed, thank you for mentioning it here.
One important consideration: retries. We should think of how to instrument those, because they should not be instrumented like the whole request pipeline. Otherwise, you might end up with requests that look like they took seconds because they were retried a few times. Just putting this out there.
One good example I've seen is in the stripe hex lib, where the retry count is one of the metadata fields. Works pretty ok.
Yeah I think emitting events and having the retry information in the event itself would be enough for other libs to build stuff on top of them.