opentelemetry-go-instrumentation icon indicating copy to clipboard operation
opentelemetry-go-instrumentation copied to clipboard

Custom SDK implementation

Open MrAlias opened this issue 1 year ago • 14 comments

Initial support for interoperability with manual instrumentation was added in https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/523. This approach has issues when trying to implement methods like IsRecording, RecordError, and AddEvent, as well as the inability to capture the SpanStartOptions which contain important definitions about the span.

Provide our own SDK

An alternate solution for this interoperability was proposed in https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/352#issuecomment-1811166718:

Provide a TracerProvider implementation from auto-instrumentation that manual instrumentation can explicitly use to say they want their data to go to the same pipeline. This means the auto-instrumentation will control the type that manual instrumentation uses to create, edit, and end spans if they want the data to be sent to the auto-instrumentation pipeline

Providing an SDK implementation like this would allow us to design minimal overhead in the Go implementation to resolve all the issues found with the current approach. Namely:

  • For IsRecording, the span implementation could read its value from the span struct field that could be set in the eBPF space
  • RecordError could call a method like `func (s span) recordError(errMsg, errName string, cfg trace.EventConfig). This method would have all the needed information for the probe already resolved.
  • Similar to RecordError, AddEvent could call a addEvent method designed to have all the information for the probe already resolved in the call arguments.

Interoperability

The remaining issues with providing a custom SDK, is how this would be integrated into users code?

The original idea was to offer this SDK as a package under go.opentelemetry.io/auto. This could still be provided, but if this is the only way this solution is integrated it means user code needs to be updated to inter-op with auto-instrumentation.

Requiring user code to be updated to work with auto-instrumentation is not desired. The whole point of auto-instrumentation is this kind of action would not be required.

Instead, we would like to have the default global otel implementation inter-op with this new SDK:

  • A flag of some sort (i.e. a package level boolean var) would be added go.opentelemetry.io/otel/internal/global.
  • This flag would be false by default
  • When the Go auto-instrumentation loads, it will set this flag to true
  • When the global Tracer implementation creates a new span it will check this added flag (i.e. here), and if true will return our custom SDK implementation of the Span instead of the nonRecordingSpan

Action Items

  • [ ] Prototype interoperability proposal to test its viability
  • [ ] Present proposal to Go SIG
    • Re-evaluate and adjust after feedback from Go SIG
  • [ ] Build custom SDK
  • [ ] Publish custom SDK

MrAlias avatar Jul 16 '24 19:07 MrAlias

@MrAlias What do you think should be included in the prototype?

RonFed avatar Jul 28 '24 06:07 RonFed

Ideally, something like this:

package main

import (
	"fmt"
	"time"
)

var Flag bool

func main() {
	for {
		fmt.Printf("Flag: %t", Flag)
		time.Sleep(time.Second)
	}
	// Output: Flag: false
	// Auto-instrumentation added
	// Output: Flag: true
}

MrAlias avatar Aug 06 '24 17:08 MrAlias

We may also need to look into having the global package call something like:

func (t *tracer) start(eBPFFlag *bool, ctx context.Context, name string, opts ...trace.SpanStartOption) (context.Context, trace.Span) {
/* ... */
}

and instrument that instead of Start so we can find the flag location.

MrAlias avatar Aug 06 '24 17:08 MrAlias

@MrAlias I did a POC for the second option we talked about. It is in https://github.com/open-telemetry/opentelemetry-go-instrumentation/compare/main...RonFed:opentelemetry-go-instrumentation_fork:flag_POC

Here are some logs from the app being instrumented and the agent:

rolldice-1 | 2024-08-09T15:42:14.291Z INFO app/main.go:64 starting http server {"port": ":8080"} rolldice-1 | 2024-08-09T15:42:14.291Z INFO app/main.go:52 probed function called with flag unset go-auto-1 | {"level":"info","ts":1723218134.4202611,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:86","msg":"building OpenTelemetry Go instrumentation ...","globalImpl":false} rolldice-1 | 2024-08-09T15:42:15.295Z INFO app/main.go:52 probed function called with flag unset rolldice-1 | 2024-08-09T15:42:16.297Z INFO app/main.go:52 probed function called with flag unset go-auto-1 | {"level":"info","ts":1723218136.4233193,"logger":"Instrumentation.Analyzer","caller":"process/discover.go:67","msg":"found process","pid":12030} go-auto-1 | {"level":"info","ts":1723218136.4287593,"logger":"Instrumentation.Allocate","caller":"process/allocate.go:73","msg":"Detaching from process","pid":12030} go-auto-1 | {"level":"info","ts":1723218136.4289303,"logger":"Instrumentation","caller":"app/instrumentation.go:144","msg":"target process analysis completed","pid":12030,"go_version":"1.22.5","dependencies":{"go.uber.org/multierr":"1.10.0","go.uber.org/zap":"1.27.0","std":"1.22.5"},"total_functions_found":3} go-auto-1 | {"level":"info","ts":1723218136.428993,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:119","msg":"starting instrumentation..."} go-auto-1 | {"level":"info","ts":1723218136.4319375,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:198","msg":"loading probe","name":"net/http/server"} go-auto-1 | {"level":"info","ts":1723218136.4832294,"logger":"Instrumentation.Manager","caller":"instrumentation/manager.go:198","msg":"loading probe","name":"main/client"} go-auto-1 | {"level":"info","ts":1723218136.4842162,"logger":"go.opentelemetry.io/auto","caller":"cli/main.go:115","msg":"instrumentation loaded successfully"} rolldice-1 | 2024-08-09T15:42:17.299Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:18.300Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:19.301Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:20.302Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:21.303Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:22.304Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:23.305Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:24.307Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:25.308Z INFO app/main.go:50 probed function called with flag set rolldice-1 | 2024-08-09T15:42:26.309Z INFO app/main.go:50 probed function called with flag set

RonFed avatar Aug 09 '24 15:08 RonFed

Awesome! Let me see about getting an issue created in https://github.com/open-telemetry/opentelemetry-go to make the proposal based on these findings.

MrAlias avatar Aug 09 '24 21:08 MrAlias

opentelemetry-go proposal: https://github.com/open-telemetry/opentelemetry-go/issues/5702

@RonFed PTAL

MrAlias avatar Aug 09 '24 21:08 MrAlias

@MrAlias LGTM, In the suggested change, do you think we should probe the tracer.newSpan (which has the flag pointer) or the interop.newSpan (which has the configuration ready)? Ideally, we should build it in a way that we can handle both in a single eBPF program if it is possible.

RonFed avatar Aug 10 '24 08:08 RonFed

@MrAlias LGTM, In the suggested change, do you think we should probe the tracer.newSpan (which has the flag pointer) or the interop.newSpan (which has the configuration ready)? Ideally, we should build it in a way that we can handle both in a single eBPF program if it is possible.

I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?

MrAlias avatar Aug 10 '24 16:08 MrAlias

I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?

Yea, that sounds good, we'll need to have a return probe for that function as well simillar to what we have today - in order to save the returned span pointer in a map.

RonFed avatar Aug 10 '24 18:08 RonFed

I was thinking tracer.newSpan as that would allow us to set the bool and trace in one probe. Right?

Yea, that sounds good, we'll need to have a return probe for that function as well simillar to what we have today - in order to save the returned span pointer in a map.

Correct.

MrAlias avatar Aug 19 '24 16:08 MrAlias

With the custom sdk we're providing, do we still need the probes for the global instrumentation that we already have? Or could we just implement those functions in our sdk (Start, End, SetAttributes, etc)?

In that case we would only need to auto-instrument the custom sdk's ended function so that the object gets passed to the auto-agent's export pipeline.

damemi avatar Sep 09 '24 17:09 damemi

With the custom sdk we're providing, do we still need the probes for the global instrumentation that we already have? Or could we just implement those functions in our sdk (Start, End, SetAttributes, etc)?

In that case we would only need to auto-instrument the custom sdk's ended function so that the object gets passed to the auto-agent's export pipeline.

Correct. After we have opentelemetry-go use our SDK we won't need to instrument the nonRecordingSpan or tracer from go.openetelemetry.io/otel/internal/global.

MrAlias avatar Sep 09 '24 18:09 MrAlias

We will still need to instrument the initial call to tracer.Start to set the bool there stating the auto-instrumentation is registered.

MrAlias avatar Sep 09 '24 18:09 MrAlias

We will still need to instrument the initial call to tracer.Start to set the bool there stating the auto-instrumentation is registered.

Hmm, we could also look into just instrumenting a bool in our SDK that the global SDK would look up via our API. Might be something we should look into.

MrAlias avatar Sep 09 '24 18:09 MrAlias

With #1195 merged, the remaining tasks for this are to publish the sdk module and integrate it with the global API.

MrAlias avatar Oct 22 '24 16:10 MrAlias

Done.

There are still some clean-up tasks that can be addressed in their own issues.

MrAlias avatar Dec 10 '24 17:12 MrAlias