opentelemetry-go-instrumentation Custom Probes

There are many packages that are not instrumented by this project. Some open-source and others closed-source. There should be a way to configure a custom Probe definition that users can provide and would provide instrumentation for these unsupported packages.

Sep 16 '24 14:09 MrAlias

That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.

Sep 16 '24 14:09 RonFed

Yeah I think a refined, exposed Probe interface inherently provides the mechanism for custom probes. But calling out that that is intentional is good to do

Sep 16 '24 15:09 damemi

That's a great idea. I think one of the steps in getting there is to refine our Probe interface, maybe worth opening another issue for that? I have some ideas for improving the existing interface.

I'm happy to have this issue be the place we discuss the needed probe changes. :+1:

Sep 16 '24 15:09 MrAlias

I'll try to list the main points I was thinking about:

Changes to the current functions

`Load(link.Executable, process.TargetDetails, config.Sampler) error`

Currently, this function has 2 main purposes: 1) loading the eBPF programs and maps into the kernel. 2) attaching the programs to the relevant symbols via uprobes.

Splitting this function to 2 might be better: Load and Attach. This separation might help make the code clearer, better error handling, and some added functionality (loading some program once and attaching it to different places in runtime).
In the future this function will also be responsible for setting the initial configuration (for example sampling, attributes collection, size of perf buffer etc'). If we decide to split the function, this part is probably more fitting in the Attach function. In addition, the general configuration struct will need to be decided instead of passing the sampling config alone.
Passing context here.

`Run(eventsChan chan<- *Event)`

The event type will probably change to something from pdata as was suggested.
This function might not be relevant for all probes. I'll write more on that in the next section.
Do we want to pass context here? currently the cancelation of this function is done by closing the perf buffer which causes the reading loop to exit.

New functions suggestions

`Status() Status`

This will return the status of the probe (Running, closed, error) It will be useful for constructing a higher API which will report a general status of all the probes.

`SupportedConfiguration() Config`

Each probe can have a different set of configurations it supports (for example: collecting HTTP headers, including the url.path in span name, collecting db.query.text, sampling by kafka topic, ...). This function will alow each probe to declare which configuration it supports.

`ApplyConfig(Config)`

Changing the configuration of the probe, by applying the supplied one.

Different types of probes

Our current interface assumes each probe will report events. This is not necessarily the case. For example, in #954 we want to have a probe which will write some value to a boolean variable. This kind of probe acts as a Job - it has a dedicated task and it should not be running in the background. Maybe we want to have a base Probe interface, and 2 additional interfaces which will embed it: ReportingProbe and JobProbe?

Sep 16 '24 19:09 RonFed

@RonFed this is a great overview, thanks for writing that up

+1 to the idea of splitting Load and Attach. I think having atomic operations is much clearer and allows for more extension points. We can always have a LoadAndAttach later for syntactic sugar if it's that convenient for users
+1 to passing context in Run() rather than implicitly exiting when the perf buffer is closed. But, could this lead to the perf buffer never being cleaned up?
What are you thinking for some use cases for the Status() function?
What's the purpose of SupportedConfiguration()? Would I call that before instantiating a Probe, then cast it to the type of config for that Probe? Couldn't each Probe just export a Config struct that you can use directly?

Different types of probes makes sense too. Could you generalize it even further to something like SyncProbe (does an action) and AsyncProbe (listens for events)?

Sep 16 '24 20:09 damemi

From the sig meeting, my thinking of basing Probe config off the k8s scheduler framework (specifically the Plugin API)

If we have 2 types of probe ProbeA and ProbeB we could (roughly) have the following api:

So our root API would provide (something like)

// go.otel.io/auto/api
// Base types for probe and config
type Config interface {
  Package() string
}

type Probe interface {
  ApplyConfig(Config)
}

And we also require a standard New() function that's used by our auto-inst SDK to create the generic probe interface and call functions like Load() and Attach()

Then a user (or us) could import it and implement their custom probe

// github.com/me/foo-auto
// ProbeA implementations
type ProbeA interface {
  Probe
  DoProbeAThing()
}

type ProbeAConfig struct {
  CustomA string
  CustomSetting int
}

func (a *ProbeAConfig) Package() string {
  return a.CustomA
}

func New(otelauto.Config probeConfig) otelauto.Probe {
  probeAconfig := probeConfig.(*ProbeAConfig)
  doThing(probeAconfig.CustomSetting)
  return ProbeA{...}
}

There's some examples of this in place in this scheduler repo. I've just always thought it was a cool approach and think it fits here too

Sep 17 '24 17:09 damemi

@damemi I'm not sure I understood your suggestion. Assuming our current probe for simplicity, can you give an example of how configuration of HTTP or DB will look?

Sep 18 '24 15:09 RonFed

@RonFed I'll try to put together a more concrete prototype, I was looking into it and got kind of sidetracked just thinking about the overall structure.

On that note, I have some ideas of what the final layout could look like for the Probe api that might help as an end goal:

Move /internal/pkg/instrumentation/probe to a root /pkg/probe
- ie go.opentelemetry.io/auto/pkg/probe
- It makes sense to keep the Probe API decoupled from the entire Instrumentation root package so that custom probes can be defined in isolation and imported without depending on the full auto-instrumentation SDK. In other words, users (and us) should be able to write a Probe that is entirely its own module (similar to collector-contrib)
As proof, move our default probes from /internal/pkg/instrumentation/bpf to a root /probes
- for example go.opentelemetry.io/auto/probes/net/http
- If we define a modular Probe api that's done right then our default probes should also be built with it. This detangles our codebase and serves as documentation for writing custom probes
- This also provides them externally as a "probe library". We don't even need to worry about an ownership model for custom probes if users can define them as their own modules in their own repos (unless we want to offer a "contrib build" which I don't think is on the table right now)
Finally, wire probes through NewInstrumentation()

Something like:

import (
  "go.opentelemetry.io/auto"
  "go.opentelemetry.io/auto/probes/net/http"
  "github.com/damemi/customprobe"
)

func main() {
  inst, err := auto.NewInstrumentation(ctx,
    auto.WithProbes(
      http.New(),
      customprobe.New())
  )
...
}

Is this kind of what people were thinking for a public Probe api?

Sep 21 '24 12:09 damemi

@damemi Yes, that is what I was thinking about as well in terms of our end goal in this issue.

Sep 21 '24 12:09 RonFed

@RonFed sounds good, so to show what I'm saying I did an example with db/http here: https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1123

Could you show what you mean for a SupportedConfig() method like you proposed? I might just be missing the use case for it, or maybe we're talking about the same thing

Sep 23 '24 17:09 damemi

as part of this (and shifting Probes to NewInstrumentation()) we would also define a method signature for a probe's New() func that enforces passing probe.Config as an argument.

Sep 23 '24 17:09 damemi

@damemi The configuration example looks great to me. Is the package function in the config interface just an example? I agree with the need for having a config interface, just not sure about its functions. My initial idea was to have a common struct to all the probes (including some generic maps) - in that scenario, the SupportedConfig() function will return the keys of the map - letting the user know which configurations are available. However, the approach in your PR is better so that function is not necessary if we'll have all the configuration structs exported.

Sep 24 '24 06:09 RonFed

@RonFed yeah I just used Package() as an example of something that all Probes might need to expose to the Instrumentation machinery (in this case I used it just for logging).

Overall this is really pretty much exactly how the Collector does config for different components. It's kind of similar to the k8s scheduler like I mentioned.

But yes I think it's better to have the configuration exposed in a concrete type that's owned by the Probe itself, so that's the main idea with this approach

Sep 24 '24 13:09 damemi

SIG meeting notes:

We need to decouple the structfield.ID in the Manifest.
- This needs to look at offset generation
- Possibly move offsets into a per-probe generation which would require a big overhaul of the offset generator.
The Run method needs to be updated to transport ptrace.Traces

Sep 24 '24 17:09 MrAlias

An interesting thought: with custom probes, this opens up the possibility to start instrumenting binaries other than Go (i.e. Rust, C++). Especially if the offsets are self contained in the probe.

Sep 24 '24 17:09 MrAlias

@damemi I think we can try and add the configuration layer similar to your example as a first step. This will need to be integrated with the definitions at provider.go WDYT?

Oct 19 '24 18:10 RonFed

The current Run and Close method is defined as follow:

type Probe interface {
	// ...
	Run(tracesChan chan<- ptrace.ScopeSpans)
	Close() error
}

Issues

The channel passed to `Run` is closed by another process after it calls `Close`

When a process wants to close probes it first calls Close and then when that call has returned it assumes Run is complete so it can close chan. The telemetryChan is not guaranteed to be unused though. There may be incorrect implementations or just concurrency patterns that mean something is still trying to be send on the telemetryChan.

Sending on a closed channel will cause a panic. Therefore, this design should be changed.

Instead of having Run receive a write-only channel, have it return a read-only channel it is writing to in a spawned goroutine. When Close is called, and the probe is sure it is done writing to the returned channel, it can close that channel and ensure no panic is raised.

The channel currently only supports trace telemetry

If the Probe produces log, metric, or other telemetry in the future it will need a forward compatible way to send this data.

Instead of having Run use a channel defined on ptrace.ScopeSpans, introduce a new probe.Telemetry type:

// Telemetry is the data read by a probe.
type Telemetry struct {
	ScopeSpans ptrace.ScopeSpans
}

This new type can have fields added for additional signals when they are supported.

Proposal

The Run method would become:

type Probe interface {
	// ...
	Run() <-chan Telemetry
	Close() error
}

Nov 05 '24 21:11 MrAlias

// Telemetry is the data read by a probe.
type Telemetry struct {
	ScopeSpans ptrace.ScopeSpans
}

Having this include an error may also help communicate errors from the probe. For example:

// Telemetry is the data read by a probe.
type Telemetry struct {
	ScopeSpans ptrace.ScopeSpans

	// Error is the error encountered when reading the telemetry data.
	Error error
}

Nov 05 '24 22:11 MrAlias

Proposal

The Run method would become:
type Probe interface {
	// ...
	Run() <-chan Telemetry
	Close() error
}

Based on https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1248, this proposal should be revised to:

type Probe interface {
	// ...
	Run(func(Telemetry))
	Close() error
}

Nov 06 '24 21:11 MrAlias

@MrAlias and I were talking about this last week, and I think the points we came up with for the design were (let me know if I missed anything):

It should be rich enough to cover different Probes and bpf program types
It should be extensible and generic enough to cover use cases we haven't thought of yet
Probes are the first API we should work out in our broader library refactor, because they unlock design in other areas (like the Manager and OTel Pipeline)

Tyler suggested a factory pattern for this, and I agree that would probably check all the boxes. I can work on a POC for that this week. I'd like to implement this in a way that first wraps the current Probes so that we can transition one by one in smaller PRs.

Nov 18 '24 20:11 damemi

The factory pattern will be good to start internally. To fully support this from external users we will need to support multi-target running (https://github.com/open-telemetry/opentelemetry-go-instrumentation/issues/197).

Nov 18 '24 22:11 MrAlias

I think the Probe interface should have a way of stating which versions it supports - This will help with cases like https://github.com/open-telemetry/opentelemetry-go-instrumentation/pull/1302

Nov 19 '24 21:11 RonFed

I think the Probe interface should have a way of stating which versions it supports - This will help with cases like #1302

Is this something you see the Manifest returned from a probe handling?

Nov 19 '24 21:11 MrAlias

Is this something you see the Manifest returned from a probe handling?

Yea, that makes sense. We can add library version support and Go version support as part of the Manifest. That way the manager can figure out what can be loaded.

Nov 19 '24 21:11 RonFed

Meeting notes:

odigos-architecture.pdf

Deep dive into how Odigos is using the project
- Event loop
- Process detection module provides events
  - Uses eBPF for process detection
- Configuration events
- Results
  - Call the Go auto-instrumentation
    - Start
    - Ended
    - Configured
  - Also call other Odigos modules
The fact that daemons need global state means that we may not want to build them into the probe API. This feature/coordination will be in the CLI/Beyla donation/Odigos agent. The auto-instrumentation Go project will be scoped to running “probes” that instrument Go applications for a single process.
Process detection should be moved out (only accept a PID and have process management outside of Instrumentation)
Importing this into Beyla or Odigos has a binary-in-git issue. #1233 needs resolution.

Feb 19 '25 19:02 MrAlias

The fact that daemons need global state means that we may not want to build them into the probe API. This feature/coordination will be in the CLI/Beyla donation/Odigos agent. The auto-instrumentation Go project will be scoped to running “probes” that instrument Go applications for a single process.

Process detection should be moved out (only accept a PID and have process management outside of Instrumentation)

+1 to these points, this benefits go-auto by scoping the work of this library to a reasonable amount, and supports users(vendors) to implement the coordination/detection/management in their own ways, which IMO is kind of the point of this project as a library. This gives us a clearer definition of what this repo is meant to do too

Feb 20 '25 16:02 damemi

opentelemetry-go-instrumentation opentelemetry-go-instrumentation copied to clipboard

Custom Probes

Changes to the current functions

Load(*link.Executable, *process.TargetDetails, config.Sampler) error

Run(eventsChan chan<- *Event)

New functions suggestions

Status() Status

SupportedConfiguration() Config

ApplyConfig(Config)

Different types of probes

Issues

The channel passed to Run is closed by another process after it calls Close

The channel currently only supports trace telemetry

Proposal

Proposal

opentelemetry-go-instrumentation
opentelemetry-go-instrumentation copied to clipboard

`Load(link.Executable, process.TargetDetails, config.Sampler) error`

`Run(eventsChan chan<- *Event)`

`Status() Status`

`SupportedConfiguration() Config`

`ApplyConfig(Config)`

The channel passed to `Run` is closed by another process after it calls `Close`