opentelemetry-instrumentations-js
opentelemetry-instrumentations-js copied to clipboard
chore: add an error attribute to span if response code is 500
When a GET remix.request returns a 500 status code, somehow it doesn't end up in the catch block. Due to this the server response is not marked as an error and there is no error stack trace as well.
I'm fairly new to open telemetry and I maybe wrong about this. Should there be any other way to handle the internal server error returned by remix.request instance?
So I think generally I'm open to adding an error: true attribute when the response is an http status 5xx. That's a pretty common pattern.
there is no error stack trace as well. If the response is a 500 then we're not guaranteed to have an error stack or message. Sometimes the response can include that information, but not in a reliable way.
For completeness (and my future self), there are four ways to accomplish this:
- Rely on something like the OpenTelemetry Collector to add the attribute based on the status code. This is probably the "recommended" answer, but I recognize that a lot of projects might not be "big enough" to use that.
- You could write some OpenTelemetry nodejs processors as part of your pipeline (ie https://github.com/open-telemetry/opentelemetry-js/discussions/2817) which has the advantage that you can really do whatever you want to the span based on the currently set attributes. In this case you could process the
http.statusvalue and appropriately add anerror: true, but wouldn't be able to read any other response details. - Add a function to the RemixInstrumentation config that can process the response objects and add span attributes. This means you can better augment the existing spans with any information that might be unique to your environment, which is nice, and means we don't have to support the full spectrum of attributes.
- Add the
error: truetag forhttp.status: 5xx. You won't get custom handling of error messages, but you'd at least get some initial error support in whatever tool you're using to analyze.
This PR is implementing option (4), which in this case I think is okay, but I'll add some code comments on how to better organize it.
If you're looking to get error messages and status codes though we'll need to think through that a bit more, so just let me know.