opentelemetry-go-instrumentation
opentelemetry-go-instrumentation copied to clipboard
Custom SDK implementation
Initial support for interoperability with manual instrumentation was added in https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/523. This approach has issues when trying to implement methods like IsRecording, RecordError, and AddEvent, as well as the inability to capture the SpanStartOptions which contain important definitions about the span.
Provide our own SDK
An alternate solution for this interoperability was proposed in https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/352#issuecomment-1811166718:
Provide a TracerProvider implementation from auto-instrumentation that manual instrumentation can explicitly use to say they want their data to go to the same pipeline. This means the auto-instrumentation will control the type that manual instrumentation uses to create, edit, and end spans if they want the data to be sent to the auto-instrumentation pipeline
Providing an SDK implementation like this would allow us to design minimal overhead in the Go implementation to resolve all the issues found with the current approach. Namely:
- For
IsRecording, the span implementation could read its value from the spanstructfield that could be set in the eBPF space RecordErrorcould call a method like `func (s span) recordError(errMsg, errName string, cfg trace.EventConfig). This method would have all the needed information for the probe already resolved.- Similar to
RecordError,AddEventcould call aaddEventmethod designed to have all the information for the probe already resolved in the call arguments.
Interoperability
The remaining issues with providing a custom SDK, is how this would be integrated into users code?
The original idea was to offer this SDK as a package under go.opentelemetry.io/auto. This could still be provided, but if this is the only way this solution is integrated it means user code needs to be updated to inter-op with auto-instrumentation.
Requiring user code to be updated to work with auto-instrumentation is not desired. The whole point of auto-instrumentation is this kind of action would not be required.
Instead, we would like to have the default global otel implementation inter-op with this new SDK:
- A flag of some sort (i.e. a package level boolean
var) would be addedgo.opentelemetry.io/otel/internal/global. - This flag would be
falseby default - When the Go auto-instrumentation loads, it will set this flag to
true - When the global
Tracerimplementation creates a new span it will check this added flag (i.e. here), and iftruewill return our custom SDK implementation of theSpaninstead of thenonRecordingSpan
Action Items
- [ ] Prototype interoperability proposal to test its viability
- [ ] Present proposal to Go SIG
- Re-evaluate and adjust after feedback from Go SIG
- [ ] Build custom SDK
- [ ] Publish custom SDK
@MrAlias What do you think should be included in the prototype?
Ideally, something like this:
package main
import (
"fmt"
"time"
)
var Flag bool
func main() {
for {
fmt.Printf("Flag: %t", Flag)
time.Sleep(time.Second)
}
// Output: Flag: false
// Auto-instrumentation added
// Output: Flag: true
}
We may also need to look into having the global package call something like:
func (t *tracer) start(eBPFFlag *bool, ctx context.Context, name string, opts ...trace.SpanStartOption) (context.Context, trace.Span) {
/* ... */
}
and instrument that instead of Start so we can find the flag location.
@MrAlias I did a POC for the second option we talked about. It is in https://github.com/open-telemetry/opentelemetry-go-instrumentation/compare/main...RonFed:opentelemetry-go-instrumentation_fork:flag_POC
Here are some logs from the app being instrumented and the agent:
rolldice-1 | 2024-08-09T15:42:14.291Z INFO app/main.go:64 starting http server {"port": ":8080"} rolldice-1 | 2024-08-09T15:42:14.291Z INFO app/main.go:52 probed function called with flag unset go-auto-1 | {"level":"info","ts":1723218134.4202611,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:86","msg":"building OpenTelemetry Go instrumentation ...","globalImpl":false} rolldice-1 | 2024-08-09T15:42:15.295Z INFO app/main.go:52 probed function called with flag unset rolldice-1 | 2024-08-09T15:42:16.297Z INFO app/main.go:52 probed function called with flag unset go-auto-1 | {"level":"info","ts":1723218136.4233193,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:67","msg":"found process","pid":12030} go-auto-1 | {"level":"info","ts":1723218136.4287593,"logger":"Instrumentation.Allocate","caller":"process/allocate.go:73","msg":"Detaching from process","pid":12030} go-auto-1 | {"level":"info","ts":1723218136.4289303,"logger":"Instrumentation","caller":"app/instrumentation.go:144","msg":"target process analysis completed","pid":12030,"go_version":"1.22.5","dependencies":{"go.uber.org/multierr":"1.10.0","go.uber.org/zap":"1.27.0","std":"1.22.5"},"total_functions_found":3} go-auto-1 | {"level":"info","ts":1723218136.428993,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:119","msg":"starting instrumentation..."} go-auto-1 | {"level":"info","ts":1723218136.4319375,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:198","msg":"loading probe","name":"net/http/server"} go-auto-1 | {"level":"info","ts":1723218136.4832294,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:198","msg":"loading probe","name":"main/client"} go-auto-1 | {"level":"info","ts":1723218136.4842162,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:115","msg":"instrumentation loaded successfully"} rolldice-1 | 2024-08-09T15:42:17.299Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:18.300Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:19.301Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:20.302Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:21.303Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:22.304Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:23.305Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:24.307Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:25.308Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:26.309Z INFO app/main.go:50 probed function called with flag set
Awesome! Let me see about getting an issue created in https://github.com/open-telemetry/opentelemetry-go to make the proposal based on these findings.
opentelemetry-go proposal: https://github.com/open-telemetry/opentelemetry-go/issues/5702
@RonFed PTAL
@MrAlias LGTM,
In the suggested change, do you think we should probe the tracer.newSpan (which has the flag pointer) or the interop.newSpan (which has the configuration ready)?
Ideally, we should build it in a way that we can handle both in a single eBPF program if it is possible.
@MrAlias LGTM, In the suggested change, do you think we should probe the
tracer.newSpan(which has the flag pointer) or theinterop.newSpan(which has the configuration ready)? Ideally, we should build it in a way that we can handle both in a single eBPF program if it is possible.
I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?
I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?
Yea, that sounds good, we'll need to have a return probe for that function as well simillar to what we have today - in order to save the returned span pointer in a map.
I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?
Yea, that sounds good, we'll need to have a return probe for that function as well simillar to what we have today - in order to save the returned span pointer in a map.
Correct.
With the custom sdk we're providing, do we still need the probes for the global instrumentation that we already have? Or could we just implement those functions in our sdk (Start, End, SetAttributes, etc)?
In that case we would only need to auto-instrument the custom sdk's ended function so that the object gets passed to the auto-agent's export pipeline.
With the custom sdk we're providing, do we still need the probes for the global instrumentation that we already have? Or could we just implement those functions in our sdk (Start, End, SetAttributes, etc)?
In that case we would only need to auto-instrument the custom sdk's
endedfunction so that the object gets passed to the auto-agent's export pipeline.
Correct. After we have opentelemetry-go use our SDK we won't need to instrument the nonRecordingSpan or tracer from go.openetelemetry.io/otel/internal/global.
We will still need to instrument the initial call to tracer.Start to set the bool there stating the auto-instrumentation is registered.
We will still need to instrument the initial call to
tracer.Startto set theboolthere stating the auto-instrumentation is registered.
Hmm, we could also look into just instrumenting a bool in our SDK that the global SDK would look up via our API. Might be something we should look into.
With #1195 merged, the remaining tasks for this are to publish the sdk module and integrate it with the global API.
Done.
There are still some clean-up tasks that can be addressed in their own issues.