opentelemetry-go-instrumentation
opentelemetry-go-instrumentation copied to clipboard
Instrument `RecordError` from default OTel default global impl
Originally posted by @MrAlias in https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/523#discussion_r1408318459
There are going to be a few complex issues to address in this issue:
- The first argument passed is a Go
errorinterface - The method accepts a variadic argument of
trace.EventOptions - This needs to produce a span event
cc @RonFed any thoughts on how to solve/address some of these?
The first argument passed is a Go error interface
Similar to the default Go SDK implementation we need to convert both the error type and content into semantic conventions:
https://github.com/open-telemetry/opentelemetry-go/blob/e8c22e6e7180056cbb229a474c8ee98c7696ce07/sdk/trace/span.go#L457-L460
The error type information should be interpret-able from the structure of the Go interface type itself if it has a name (likely min version of Go needed to support this). This intuition comes from the fact that the reflect package can determine this information.
It is not obvious how to get the error string representation can be determined from within the eBPF scope. To satisfy the interface, the implementation needs to have some Error method, but I'm not sure it is possible to call that from the eBPF space. Not sure how to solve this(?).
The method accepts a variadic argument of trace.EventOptions
This builds on issue identified above: how do we apply the trace.EventOptions to a ``trace.EventConfig` in eBPF space to understand the configuration required for the event? For example, how do we do this in eBPF space:
https://github.com/open-telemetry/opentelemetry-go/blob/e8c22e6e7180056cbb229a474c8ee98c7696ce07/sdk/trace/span.go#L462
This needs to produce a span event
This means the resolution of this issue is going to mean solving most, if not all, of the event pipeline needed to resolve #541.
@MrAlias Those are great points.
It is not obvious how to get the error string representation can be determined from within the eBPF scope. To satisfy the interface, the implementation needs to have some Error method, but I'm not sure it is possible to call that from the eBPF space. Not sure how to solve this(?).
There are 2 approaches to get the error string:
- The first and the simpler one is reading the string from the error concrete type. When calling
errors.Neworfmt.Errorfthe concrete structs contain the error message as the first field in the struct. This probably covers a significant percentage of the cases but it is not a 100% solution since it won't support custom errors. - Identifying the interface is possible by reading the
itabpointer of the interface: https://github.com/golang/go/blob/dfaaa91f0537f806a02ff2dd71b79844cd16cc4e/src/runtime/runtime2.go#L205 Identifying what is the concrete type of the interface is very useful in many contexts in our instrumentation, but in the error context, it still won't completely help us since the error struct can be an arbitrary one.
This builds on issue identified above: how do we apply the trace.EventOptions to a ``trace.EventConfig` in eBPF space to understand the configuration required for the event? For example, how do we do this in eBPF space:
This also comes down to identifying for each passed option what is the concrete type. Assuming we know the concrete type for each option, we can use offsetgen to read the options and apply them to the span created in eBPF.
I think adding the capability to identify interfaces is achievable and will unlock cool functionality.
- When calling
errors.Neworfmt.Errorfthe concrete structs contain the error message as the first field in the struct. This probably covers a significant percentage of the cases but it is not a 100% solution since it won't support custom errors.
I worry adding a partial solution like this may be more harm than help. Partially support may lead users into a false sense of security that this feature is supported, but they end up with missing data in the end. Given errors are a critical part of monitoring service health, this could be a pretty severe issue.
I'm not aware of a better way to handle it though. Which makes we wonder if we should just document that this functionality is not supported instead.
@open-telemetry/go-instrumentation-approvers thoughts?